There are three potential sources of the hostname, one of which is
documented SmartOS's vmadm(1M) via the hostname property. That
property's value is retrieved via the sdc:hostname key. The other
two sources for the hostname are a hostname key in customer_metadata
and the VM's uuid (sdc:uuid). Of these three, the sdc:hostname value
is not used in a meaningful way by DataSourceSmartOS.
This fix changes the fallback mechanism when hostname is not
specified in customer_metadata. The order of precedence for setting
the hostname is now 1) hostname in customer_metadata,
2) sdc:hostname, then 3) sdc:uuid.
cloud-init and mdata-get each have their own implementation of the SmartOS
metadata protocol. If cloud-init and other services that call mdata-get are
run concurrently, crosstalk on the serial port can cause them both to become
confused.
This change makes it so that cloud-init uses the same cooperative locking
scheme that's used by mdata-get, thus preventing cross-talk between
mdata-get and cloud-init.
For testing, a VM running on a SmartOS host and pyserial are required.
pyserial remains commented in requirements.txt because most testers
will not be running atop SmartOS.
DataSourceSmartOS: list() should always return a list
If customer_metadata has no keys, the KEYS request returns an empty string.
Callers of the list() method expect a list to be returned and will give a
stack trace if this expectation is not met.
DataSourceSmartOS: hang when metadata service is down
If the metadata service in the host is down while a guest that uses
DataSourceSmartOS is booting, the request from the guest falls into the bit
bucket. When the metadata service is eventually started, the guest has no
awareness of this and does not resend the request. This results in
cloud-init hanging forever with a guest reboot as the only recovery option.
This fix updates the metadata protocol to implement the initialization
phase, just as is implemented by mdata-get and related utilities. The
initialization phase includes draining all pending data from the serial
port, writing an empty command and getting an expected error message in
reply. If the initialization phase times out, it is retried every five
seconds. Each timeout results in a warning message: "Timeout while
initializing metadata client. Is the host metadata service running?" By
default, warning messages are logged to the console, thus the reason for a
hung boot is readily apparent.
Fix integraiton test logic for ec2 to look for network and
availability-zone data under the key path
'ds'=>'meta-data' instead of just 'ds' when parsing instance-data.json.
tests: fix integration tests to support lxd 3.0 release
Integration tests previously had a logic path that was unexercised on
jenkins because we were on an older version of lxc. With an upgrade to lxd
version 3.0 we need to bump pylxd dependency pin and fix a typo in
integration tests which checked the lxd version.
The zfs/zpool commands will hang for 10 seconds if /dev/zfs is not
present (bug 1760173). This is a common occurence for containers
using zfs as rootfs. Additionally handle missing zpool command or
other errors that may occur while executing the zpool command.