buildd fails to build trusty based images with `mknod: ‘/dev/loopX’: File exists` errors

Bug #1723216 reported by Francis Ginther
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
launchpad-buildd
Fix Released
Critical
Colin Watson

Bug Description

While trying to build trusty docker images, we are hitting the follow type of error each time (with an apparently random loop device number). We have seen this on every trusty docker build this week. These builds had not been performed since August, when it was successful.

RUN: /usr/share/launchpad-buildd/slavebin/slave-prep
Forking launchpad-buildd slave process...
Kernel version: Linux bos01-arm64-034 4.4.0-97-generic #120-Ubuntu SMP Tue Sep 19 17:31:48 UTC 2017 aarch64
Buildd toolchain package versions: launchpad-buildd_152 python-lpbuildd_152 sbuild_0.67.0-2ubuntu7.1 bzr-builder_0.7.3+bzr174~ppa13~ubuntu14.10.1 bzr_2.7.0-2ubuntu3.1 git-build-recipe_0.3.4~git201611291343.dcee459~ubuntu16.04.1 git_1:2.7.4-0ubuntu1.3 dpkg-dev_1.18.4ubuntu1.2 python-debian_0.1.27ubuntu2.
Syncing the system clock with the buildd NTP service...
12 Oct 16:16:54 ntpdate[1808]: adjust time server 10.211.37.1 offset 0.081996 sec
RUN: /usr/share/launchpad-buildd/slavebin/in-target unpack-chroot --backend=lxd --series=trusty --arch=arm64 LIVEFSBUILD-112474 /home/buildd/filecache-default/76dc409df973c3787fa3d97bbe6fa0d7ccf4bcab
Creating target for build LIVEFSBUILD-112474
RUN: /usr/share/launchpad-buildd/slavebin/in-target mount-chroot --backend=lxd --series=trusty --arch=arm64 LIVEFSBUILD-112474
Starting target for build LIVEFSBUILD-112474
mknod: ‘/dev/loop5’: File exists
Traceback (most recent call last):
  File "/usr/share/launchpad-buildd/slavebin/in-target", line 27, in <module>
    sys.exit(main())
  File "/usr/share/launchpad-buildd/slavebin/in-target", line 23, in main
    return args.operation.run()
  File "/usr/lib/launchpad-buildd/lpbuildd/target/lifecycle.py", line 40, in run
    self.backend.start()
  File "/usr/lib/launchpad-buildd/lpbuildd/target/lxd.py", line 361, in start
    "b", "7", str(minor)])
  File "/usr/lib/launchpad-buildd/lpbuildd/target/lxd.py", line 394, in run
    subprocess.check_call(cmd, **kwargs)
  File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['lxc', 'exec', 'lp-trusty-arm64', '--', 'linux64', 'mknod', '-m', '0660', '/dev/loop5', 'b', '7', '5']' returned non-zero exit status 1
RUN: /usr/share/launchpad-buildd/slavebin/in-target umount-chroot --backend=lxd --series=trusty --arch=arm64 LIVEFSBUILD-112474
Stopping target for build LIVEFSBUILD-112474
RUN: /usr/share/launchpad-buildd/slavebin/in-target remove-build --backend=lxd --series=trusty --arch=arm64 LIVEFSBUILD-112474
Removing build LIVEFSBUILD-112474

The above log is from https://launchpadlibrarian.net/340643111/buildlog_ubuntu_trusty_arm64_docker-ubuntu-core_BUILDING.txt.gz

Here are some other examples:
https://launchpadlibrarian.net/340326870/buildlog_ubuntu_trusty_armhf_docker-ubuntu-core_BUILDING.txt.gz
https://launchpadlibrarian.net/340326770/buildlog_ubuntu_trusty_amd64_docker-ubuntu-core_BUILDING.txt.gz

Related branches

Revision history for this message
Colin Watson (cjwatson) wrote :

As of systemd 211 (i.e. >= vivid), the default udev rules arrange to not create loop devices. In older releases, udev creates those, but we can race with it on startup.

I think we need to:

 * wait until the container has done more of its startup work before proceeding
 * check whether /dev/loop* exist before creating them

Neither of these is sufficient on its own.

Changed in launchpad-buildd:
status: New → In Progress
importance: Undecided → High
assignee: nobody → Colin Watson (cjwatson)
importance: High → Critical
Revision history for this message
Colin Watson (cjwatson) wrote :

Checking whether the container has started up sufficiently turns out to be difficult, mainly because we disable most services in buildd containers. Possibly a better approach would be to poke the necessary udev rules into the container before it starts.

Revision history for this message
Colin Watson (cjwatson) wrote :

Fixed in launchpad-buildd 154.

launchpad-buildd (154) xenial; urgency=medium

  * The previous patch was labouring under mistaken assumptions: it's
    actually the mounted-dev Upstart job that we race with in trusty
    containers, so neuter that instead.

 -- Colin Watson <email address hidden> Thu, 19 Oct 2017 10:29:30 +0100

launchpad-buildd (153) xenial; urgency=medium

  * Defend against racing with udev to create loop devices in trusty
    containers (LP: #1723216).

 -- Colin Watson <email address hidden> Wed, 18 Oct 2017 07:57:32 +0100

Changed in launchpad-buildd:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.