> Yes, I have verified the fallback path worked after PR #229 applied. so it
> should fix this issue.
OK. That's good news. I think for local datasources, the fallback option
works quite well.
>
> I do NOT get "...DataSourceOVF does not persist an obj.pkl". From what I
> saw, after loaded DataSourceOVF, obj.pkl file was created under instance
> path. Doesn't the fallback ds load from obj.pkl?
Sorry. I should have been more clear. While the obj.pkl is written for
local datasources, it's only used in transitioning between local and net
stages:
For all datasources, cloud-init boot sequence does the following:
1) cloud-init init --local will purge the object cache unless it's told
not too (manual cache clean); this removes the finished boot flag.
and sets the 'existing' variable to 'check'; this is important for
the next stage of cloud-init (cloud-init init)
cloudinit/cmd/main.py:main_init lines 305 -> 317
Now, cloud-init init --local will fetch the datasource with
existing='check'
On reboot scenarios; the datasource is reloaded and then cloud-init
compares the instance-id from /var/lib/cloud/data/instance_id to the
ds.get_instance_id() value; if they match, then it's restored from cache
If the do not match, then the datasource from cache (obj.pkl) is
ignored. Then cloud-init will walk the local datasources calling
._get_data() on each until a local datasource is found.
Once a datasource is found (one of the datasources _get_data() method
returns True, then self.datasource is set and we do persist an obj.pkl
cloud-init init --local exits.
2) cloud-init runs after networking is up, this time in main.py:main_init
ds mode is in NET mode which sets existing='trust'; the idea here
is that cloud-init local mode detects and finds the correct datasource
and so at NET stage we don't need to look for a datasource again if one
was found.
For *network only* datasources (Ec2 for example); ones that can only be
detected by checking a network endpoint (for example http://169.254.169.254) will not be checked until networking is up; so in
(1) we don't bother checking until this stage.
When cloud-init attempts to restore_from_checked_cache() with
existing='trust', then the pkl_load() succeeds.
Next, cloud-init looks up the on-disk instance-id:
/var/lib/cloud/data/instance-id
And then calls ds.get_instance_id() and if they match, then we're on
the same instance as we were before. For local datasources; this is
trivially true as when we found the datasource in (1) we persisted the
object and we've just loaded it.
For local-only datasources (NoCloud, OVF); I think 229 works; Specifically
for OVF since it's detection is based on files that are removed after first
boot. I'd like to add a unittest to exercise this path, specifically we'd
want.
a) /var/lib/cloud/* populated as it would look after a first boot
b) run init_main with args.local=True
c) mock datasource._get_data to return False (forcing down the fallback
d) we should verify that self.datasource matches what's in obj.pkl
We should also test paths around OVF instance_id changing/resetting. #229
operates under the assumption that the instance_id should not change between
reboots. A second unitest which ensure that if OVF instance_id changes that
the fallback path does NOT successful load OVF via fallback.
For Ec2, which is detected locally (by checking system UUID string), I don't
think we ever use the fallback path due to the local detection; this means
we always re-use the on-disk obj.pkl. I don't think that's a blocker to
merging.
@Pengpeng,
> Yes, I have verified the fallback path worked after PR #229 applied. so it
> should fix this issue.
OK. That's good news. I think for local datasources, the fallback option
works quite well.
>
> I do NOT get "...DataSourceOVF does not persist an obj.pkl". From what I
> saw, after loaded DataSourceOVF, obj.pkl file was created under instance
> path. Doesn't the fallback ds load from obj.pkl?
Sorry. I should have been more clear. While the obj.pkl is written for
local datasources, it's only used in transitioning between local and net
stages:
For all datasources, cloud-init boot sequence does the following:
1) cloud-init init --local will purge the object cache unless it's told
not too (manual cache clean); this removes the finished boot flag.
and sets the 'existing' variable to 'check'; this is important for
the next stage of cloud-init (cloud-init init)
cloudinit/ cmd/main. py:main_ init lines 305 -> 317
Now, cloud-init init --local will fetch the datasource with
existing='check'
cloudinit/ stages. py:fetch( ) lines 349 stages. py:_get_ data_sources( existing= 'check' ) lines 236
cloudinit/ stages. py:_restore_ from_checked_ cache(existing= 'check' ) lines 211
cloudinit/ stages. py:_restore_ from_cache( ) lines 184
cloudinit/ stages. py:_pkl_ load() lines 943
cloudinit/
On reboot scenarios; the datasource is reloaded and then cloud-init cloud/data/ instance_ id to the get_instance_ id() value; if they match, then it's restored from cache
compares the instance-id from /var/lib/
ds.
If the do not match, then the datasource from cache (obj.pkl) is
ignored. Then cloud-init will walk the local datasources calling
._get_data() on each until a local datasource is found.
Once a datasource is found (one of the datasources _get_data() method
returns True, then self.datasource is set and we do persist an obj.pkl
cloud-init init --local exits.
2) cloud-init runs after networking is up, this time in main.py:main_init
ds mode is in NET mode which sets existing='trust'; the idea here
is that cloud-init local mode detects and finds the correct datasource
and so at NET stage we don't need to look for a datasource again if one
was found.
For *network only* datasources (Ec2 for example); ones that can only be 169.254. 169.254) will not be checked until networking is up; so in
detected by checking a network endpoint (for example
http://
(1) we don't bother checking until this stage.
When cloud-init attempts to restore_ from_checked_ cache() with 'trust' , then the pkl_load() succeeds.
existing=
Next, cloud-init looks up the on-disk instance-id: var/lib/ cloud/data/ instance- id instance_ id() and if they match, then we're on
/
And then calls ds.get_
the same instance as we were before. For local datasources; this is
trivially true as when we found the datasource in (1) we persisted the
object and we've just loaded it.
For local-only datasources (NoCloud, OVF); I think 229 works; Specifically
for OVF since it's detection is based on files that are removed after first
boot. I'd like to add a unittest to exercise this path, specifically we'd
want.
a) /var/lib/cloud/* populated as it would look after a first boot _get_data to return False (forcing down the fallback
b) run init_main with args.local=True
c) mock datasource.
d) we should verify that self.datasource matches what's in obj.pkl
We should also test paths around OVF instance_id changing/resetting. #229
operates under the assumption that the instance_id should not change between
reboots. A second unitest which ensure that if OVF instance_id changes that
the fallback path does NOT successful load OVF via fallback.
For Ec2, which is detected locally (by checking system UUID string), I don't
think we ever use the fallback path due to the local detection; this means
we always re-use the on-disk obj.pkl. I don't think that's a blocker to
merging.