ceph-osd stays blocked on Yakkety: "No block devices detected using current configuration"

Bug #1631328 reported by Ryan Beisner
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ceph (Ubuntu)
Fix Released
Critical
James Page
Yakkety
Fix Released
Undecided
Unassigned
ceph-osd (Juju Charms Collection)
Invalid
Undecided
Unassigned

Bug Description

The "ceph/default," "ceph_radosgw/simple_nonha" and "ceph/harden" mojo specs @ Yakkety are failing like so:

Units are consistently blocked in Juju status:
ceph-osd stays blocked on Yakkety: "No block devices detected using current configuration"

Snippets from ceph-osd unit syslog:
mkjournal error creating journal on /var/lib/ceph/tmp/mnt.W59Jhd/journal: (13) Permission denied

Oct 7 03:13:30 juju-osci-sv13-machine-4 systemd[1]: Failed to start Ceph disk activation: /dev/vdb1.

See:

http://pastebin.ubuntu.com/23288515/

Revision history for this message
Ryan Beisner (1chb1n) wrote :
Download full text (3.4 KiB)

FYI, from a pristine Yakkety daily image on the same private cloud:

ubuntu@yakkety104939:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vdb 253:16 0 40G 0 disk /mnt
vda 253:0 0 20G 0 disk
├─vda14 253:14 0 4M 0 part
├─vda15 253:15 0 106M 0 part /boot/efi
└─vda1 253:1 0 19.9G 0 part /

ubuntu@yakkety104939:~$ mount -l
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=1018052k,nr_inodes=254513,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=204780k,mode=755)
/dev/vda1 on / type ext4 (rw,relatime,data=ordered) [cloudimg-rootfs]
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=27,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=10007)
mqueue on /dev/mqueue type mqueue (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
configfs on /sys/kernel/config type configfs (rw,relatime)
/dev/vda15 on /boot/efi type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro) [UEFI]
/dev/vdb on /mnt type ext4 (rw,relatime,data=ordered) [ephemeral0]
lxcfs on /var/lib/lxcfs type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,size=204776k,mode=700,uid=1000,gid=1000)

ubuntu@yakkety104939:~$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 995M 0 995M 0% /dev
tmpfs 200M 3.2M 197M 2% /run
/dev/vda1 20G 987...

Read more...

description: updated
Revision history for this message
Ryan Beisner (1chb1n) wrote :
Revision history for this message
Ryan Beisner (1chb1n) wrote :
description: updated
Revision history for this message
James Page (james-page) wrote :

Not sure this is a charm problem - digging deeper

Revision history for this message
James Page (james-page) wrote :

Might be a 10.2.3 regression (that's the diff between xenial and yakkety).

Revision history for this message
James Page (james-page) wrote :

2016-10-07 11:56:18.428993 7f393e21b8c0 0 set uid:gid to 64045:64045 (ceph:ceph)
2016-10-07 11:56:18.429145 7f393e21b8c0 0 ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b), process ceph-osd, pid 23516
2016-10-07 11:56:18.437495 7f393e21b8c0 1 filestore(/var/lib/ceph/tmp/mnt.qKtocb) mkfs in /var/lib/ceph/tmp/mnt.qKtocb
2016-10-07 11:56:18.437597 7f393e21b8c0 1 filestore(/var/lib/ceph/tmp/mnt.qKtocb) mkfs fsid is already set to 9a734eae-8313-4361-998e-af52d9e0432c
2016-10-07 11:56:18.437645 7f393e21b8c0 1 filestore(/var/lib/ceph/tmp/mnt.qKtocb) write_version_stamp 4
2016-10-07 11:56:18.440500 7f393e21b8c0 0 filestore(/var/lib/ceph/tmp/mnt.qKtocb) backend xfs (magic 0x58465342)
2016-10-07 11:56:18.446579 7f393e21b8c0 1 filestore(/var/lib/ceph/tmp/mnt.qKtocb) leveldb db exists/created
2016-10-07 11:56:18.446880 7f393e21b8c0 -1 filestore(/var/lib/ceph/tmp/mnt.qKtocb) mkjournal error creating journal on /var/lib/ceph/tmp/mnt.qKtocb/journal: (13) Permission denied
2016-10-07 11:56:18.447058 7f393e21b8c0 -1 OSD::mkfs: ObjectStore::mkfs failed with error -13
2016-10-07 11:56:18.447289 7f393e21b8c0 -1 ^[[0;31m ** ERROR: error creating empty object store in /var/lib/ceph/tmp/mnt.qKtocb: (13) Permission denied^[[0m

Revision history for this message
James Page (james-page) wrote :

# ls -lrt /dev/vdb*
brw-rw---- 1 root disk 253, 16 Oct 7 11:56 /dev/vdb
brw-rw---- 1 root root 253, 17 Oct 7 11:56 /dev/vdb1
brw-rw---- 1 root root 253, 18 Oct 7 11:56 /dev/vdb2

vs (xenial)

$ ls -l /dev/vdb*
brw-rw---- 1 root disk 253, 16 Oct 7 11:43 /dev/vdb
brw-rw---- 1 ceph ceph 253, 17 Oct 7 11:43 /dev/vdb1
brw-rw---- 1 ceph ceph 253, 18 Oct 7 11:55 /dev/vdb2

I think the udev rules should be manipulating the ownership of the device but not 100% sure.

Revision history for this message
James Page (james-page) wrote :

OK so I think I see the issue; running:

sudo udevadm trigger --subsystem-match=block --action=add

triggers the udev rules that kickoff the ceph processes; without a reload of the udev daemon, the permissions don't appear to be set correctly on the devices.

Revision history for this message
James Page (james-page) wrote :

which is odd as it looks like the only part that is not pickup up is the permissions settings - the actual ceph-disk trigger calls are being made still, but fail due to invalid permissions.

Revision history for this message
James Page (james-page) wrote :

From the udev daemon output on a broken install:

Oct 07 12:17:01 juju-20ab35ca-8336-488f-81bb-931feb9a29eb-machine-17 systemd-udevd[454]: specified user 'ceph' unknown
Oct 07 12:17:01 juju-20ab35ca-8336-488f-81bb-931feb9a29eb-machine-17 systemd-udevd[454]: specified group 'ceph' unknown

I wonder whether its because the user is created after the udev rules are installed?

Revision history for this message
James Page (james-page) wrote :

This is a little chicken and egg - the ceph-common package configures the ceph/ceph user/group, however udev notices the new rule file prior to that happening....

Revision history for this message
James Page (james-page) wrote :

This looks like a behavioural change in udev between xenial and yakkety.

Revision history for this message
James Page (james-page) wrote :

I wonder whether dh_installudev does something magical (ceph does not use that).

Changed in ceph-osd (Juju Charms Collection):
status: New → Invalid
Revision history for this message
James Page (james-page) wrote :

Not a charm bug - marking invalid.

James Page (james-page)
Changed in ceph (Ubuntu):
status: New → Triaged
importance: Undecided → Critical
Revision history for this message
Martin Pitt (pitti) wrote :

> This is a little chicken and egg - the ceph-common package configures the ceph/ceph user/group, however udev notices the new rule file prior to that happening....

udev uses inotify to pick up the new rule, but in order to actually "run" the rule the handled device must either be removed and added or udevadm trigger'ed (like in comment #8). So the postinst shold do that after creating the groups.

James Page (james-page)
Changed in ceph (Ubuntu):
assignee: nobody → James Page (james-page)
status: Triaged → In Progress
Revision history for this message
Steve Langasek (vorlon) wrote :

I think the right answer here is for ceph-osd to Pre-Depend: on ceph-common which sets up the users, then there's no need to reload because the rules will parse correctly the first time.

Also, in looking at the binary package, I don't see anywhere that ceph-osd ever calls 'udevadm trigger --subsystem-match=block --action=change' - this still needs to be explicitly done, independent of the reload handling (or not), in order to apply the newly-installed rules to existing devices.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ceph - 10.2.3-0ubuntu2

---------------
ceph (10.2.3-0ubuntu2) yakkety; urgency=medium

  * d/control: Add Pre-Depends on ceph-common to ceph-osd to ensure
    that ceph user and group are created prior to unpacking of udev
    rules (LP: #1631328).
  * d/control: Drop surplus Recommends on ceph-common for ceph-mon, as
    it already has a in-direct Depends via ceph-base.

 -- James Page <email address hidden> Sat, 08 Oct 2016 16:16:00 +0100

Changed in ceph (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Martin Pitt (pitti) wrote : Please test proposed package

Hello Ryan, or anyone else affected,

Accepted ceph into yakkety-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ceph/10.2.3-0ubuntu2.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

no longer affects: systemd (Ubuntu)
no longer affects: systemd (Ubuntu Yakkety)
Changed in ceph (Ubuntu Yakkety):
status: New → Fix Committed
tags: added: verification-needed
James Page (james-page)
tags: added: verification-done
removed: verification-needed
Revision history for this message
Robie Basak (racb) wrote : Update Released

The verification of the Stable Release Update for ceph has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ceph - 10.2.3-0ubuntu2.1

---------------
ceph (10.2.3-0ubuntu2.1) yakkety; urgency=medium

  * rgw: Fixes for creation times for buckets (LP: #1587261).
    - d/p/rgw_rados-creation_time.patch: Backport fix from upstream master.
      Fix logic error that leads to creation time being 0 instead of current
      time when creating buckets.

ceph (10.2.3-0ubuntu2) yakkety; urgency=medium

  * d/control: Add Pre-Depends on ceph-common to ceph-osd to ensure
    that ceph user and group are created prior to unpacking of udev
    rules (LP: #1631328).
  * d/control: Drop surplus Recommends on ceph-common for ceph-mon, as
    it already has a in-direct Depends via ceph-base.

 -- Frode Nordahl <email address hidden> Fri, 28 Oct 2016 07:52:07 +0200

Changed in ceph (Ubuntu Yakkety):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.