Comment 3 for bug 1771382

Revision history for this message
Martin Steigerwald (ms-proact) wrote : Re: ds-identify: fails to recognize NoCloud datasource on boot, ERROR: failed running [127]: blkid -c /dev/null -o export

I ran blkid command 10 times in a row and always got return code 0. So I don´t get why:

    out=$(blkid -c /dev/null -o export) || {
        ret=$?
        error "failed running [$ret]: blkid -c /dev/null -o export"
        DI_FS_LABELS="$UNAVAILABLE:error"
        DI_ISO9660_DEVS="$UNAVAILABLE:error"
        return $ret
    }

is not working on boot. It looks like correct shell code. Only idea I have:

It is run to early for the ISO device to become available. Okay, testing for this with this change:

slestemplate:~ # git diff /usr/lib/cloud-init/ds-identify.orig /usr/lib/cloud-init/ds-identify
diff --git a/usr/lib/cloud-init/ds-identify.orig b/usr/lib/cloud-init/ds-identify
index 9a2db5c..2083734 100755
--- a/usr/lib/cloud-init/ds-identify.orig
+++ b/usr/lib/cloud-init/ds-identify
@@ -199,14 +199,24 @@ read_fs_info() {
         return
     fi
     local oifs="$IFS" line="" delim=","
- local ret=0 out="" labels="" dev="" label="" ftype="" isodevs="" uuids=""
- out=$(blkid -c /dev/null -o export) || {
+ local ret=1 out="" labels="" dev="" label="" ftype="" isodevs="" uuids=""
+ local attempt=1
+ while [ $ret -ne 0 -a $attempt -le 10 ]; do
+ out=$( blkid -c /dev/null -o export 2>&1 )
+ ret=$?
+ if [ $ret -ne 0 ]; then
+ error "failed running [$ret]: blkid -c /dev/null -o export, attempt: $attempt, output: $out"
+ sleep 2
+ fi
+ let attempt++;
+ done
+ if [ $ret -ne 0 ]; then
         ret=$?
         error "failed running [$ret]: blkid -c /dev/null -o export"
         DI_FS_LABELS="$UNAVAILABLE:error"
         DI_ISO9660_DEVS="$UNAVAILABLE:error"
         return $ret
- }
+ fi
     # 'set --' will collapse multiple consecutive entries in IFS for
     # whitespace characters (\n, tab, " ") so we cannot rely on getting
     # empty lines in "$@" below.

Which gets me:

slestemplate:~ # cat /run/cloud-init/ds-identify.log | head -7
[up 8.80s] ds-identify
policy loaded: mode=search report=false found=all maybe=all notfound=disabled
no datasource_list found, using default: MAAS ConfigDrive NoCloud AltCloud Azure Bigstep CloudSigma CloudStack DigitalOcean AliYun Ec2 GCE OpenNebula OpenStack OVF SmartOS Scaleway Hetzner IBMCloud
ERROR: failed running [127]: blkid -c /dev/null -o export, attempt: 1, output: /usr/lib/cloud-init/ds-identify: line 205: blkid: command not found
ERROR: failed running [127]: blkid -c /dev/null -o export, attempt: 2, output: /usr/lib/cloud-init/ds-identify: line 205: blkid: command not found
ERROR: failed running [127]: blkid -c /dev/null -o export, attempt: 3, output: /usr/lib/cloud-init/ds-identify: line 205: blkid: command not found
ERROR: failed running [127]: blkid -c /dev/null -o export, attempt: 4, output: /usr/lib/cloud-init/ds-identify: line 205: blkid: command not found

Which may just mean that during startup via Systemd

slestemplate:~ # type blkid
blkid is /sbin/blkid

is not in path.

And well now I learned from bash manpagethat is exactly what the bash error code tells me (Manpage: bash(1)):

       If a command is not found, the child process created
       to execute it returns a status of 127. If a command
       is found but is not executable, the return status is
       126.

But it does not seem that the systemd generator is being run on reboot, cause I added:

slestemplate:~ # diff -u cloud-init-generator.orig /usr/lib/systemd/system-generators/cloud-init-generator
--- cloud-init-generator.orig 2018-05-16 13:08:41.302467498 +0200
+++ /usr/lib/systemd/system-generators/cloud-init-generator 2018-05-16 13:22:26.661939261 +0200
@@ -1,6 +1,8 @@
 #!/bin/sh
 set -f

+echo "PATH: $PATH" > /root/path
+
 LOG=""
 DEBUG_LEVEL=1
 LOG_D="/run/cloud-init"

at the beginning of it, yet got no output in /tmp, after reboot, while when running it manually I get the output. So it appears on reboot something else is calling it and this does not have /sbin in path.

I have no clue what else might be calling it:

slestemplate:/etc # grep -ir "ds-identify" .

slestemplate:/usr/lib/systemd # grep -ir "ds-identify"

only reports that system-generators/cloud-init-generator.

Also nothing in

slestemplate:/var # LANG=en grep -ir "ds-identify" .
Binary file ./lib/rpm/Packages matches
Binary file ./lib/rpm/Basenames matches
Binary file ./lib/mlocate/mlocate.db matches

So I am done with it for now and will just hardcode the path in ds-identify to /sbin/blkid.

And voila, this finally works. After a few dozens of attempts and reboots I finally at least have found the root cause and a work-around. I think to be really portable it ds-identify needs to try harder to find blkid, cause hard coding it to UsrMerge /usr/sbin/blkid is going to break on Debian and Ubuntu als long as UsrMerge is not done there. Or one might use /sbin/blkid at this is hard-linked on SLES 12 and RHEL 7 to /usr/sbin – and I bet these hardlinks better to be kept around for decades.

Gosh, this works. This finally works. Retitling again and adding patch.