ceph-volume: block device permissions sometimes not set on initial activate call

Bug #1767087 reported by James Page
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Ceph OSD Charm
Fix Released
High
James Page
ceph (Ubuntu)
Fix Released
High
James Page

Bug Description

When using bluestore or filestore with a separate journal device, we often see that the ceph-osd daemon crashes on activation with a 'permission denied' error on the link to the block device supporting the particular component of the OSD (bluestore block/wal/db or journal device for filestore).

chown -R ceph:ceph /dev/dm-X fixes the issue and the osd can then be started; I've also not seen this issue occur during a normal reboot (as the permissions are re-asserted by ceph-volume as OSD's are started up).

Hunch is something udev-y is racing with ceph-volume on the initial activiate call but not tracked it down yet.

Tags: cpe-onsite
James Page (james-page)
summary: - ceph-volume block device permissions not set on initial activate call
+ ceph-volume: block device permissions sometimes not set on initial
+ activate call
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ceph (Ubuntu):
status: New → Confirmed
Ante Karamatić (ivoks)
tags: added: cpe-onsite
Revision history for this message
James Page (james-page) wrote :

This is a bit of a race between ceph-volume and udev - if I force the perms to be correct by starting the ceph-volume systemd unit for an OSD:

$ getfacl /dev/ceph-4a94b8ca-bc68-4ac6-a5ec-dc10549af193/osd-block-4a94b8ca-bc68-4ac6-a5ec-dc10549af193
# file: dev/ceph-4a94b8ca-bc68-4ac6-a5ec-dc10549af193/osd-block-4a94b8ca-bc68-4ac6-a5ec-dc10549af193
# owner: ceph
# group: ceph
user::rw-
group::rw-
other::---

then re-trigger udev events for block devices:

$ udevadm trigger --subsystem-match=block --action=add
$ getfacl /dev/ceph-4a94b8ca-bc68-4ac6-a5ec-dc10549af193/osd-block-4a94b8ca-bc68-4ac6-a5ec-dc10549af193
# file: dev/ceph-4a94b8ca-bc68-4ac6-a5ec-dc10549af193/osd-block-4a94b8ca-bc68-4ac6-a5ec-dc10549af193
# owner: root
# group: disk
user::rw-
group::rw-
other::---

you can see the permissions revert back to root/disk from ceph/ceph.

James Page (james-page)
Changed in ceph (Ubuntu):
status: Confirmed → In Progress
importance: Undecided → High
assignee: nobody → James Page (james-page)
Revision history for this message
James Page (james-page) wrote :

I'm not sure this is a packaging bug; the LVM layout implemented in the ceph-osd charm is specific to the charm - it might represent some sort of best practice that's worth encoding for all but that's not a given right now.

Changed in charm-ceph-osd:
status: New → In Progress
importance: Undecided → High
assignee: nobody → James Page (james-page)
milestone: none → 18.05
Changed in ceph (Ubuntu):
status: In Progress → Incomplete
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-osd (master)

Reviewed: https://review.openstack.org/567198
Committed: https://git.openstack.org/cgit/openstack/charm-ceph-osd/commit/?id=35ad3de4f289e68f4d974e1bb25d00723f55cd12
Submitter: Zuul
Branch: master

commit 35ad3de4f289e68f4d974e1bb25d00723f55cd12
Author: James Page <email address hidden>
Date: Wed May 9 12:36:00 2018 +0100

    ceph-volume: Install charm specific udev rules

    Ensure that LV's created using the LVM layout implemented
    by this charm are correctly owned by the ceph user and group,
    ensuring that ceph-osd processes can start correctly at all
    times.

    Change-Id: I23ea51e3bffe7207f75782c5f34b796e9eed2c80
    Closes-Bug: 1767087

Changed in charm-ceph-osd:
status: In Progress → Fix Committed
David Ames (thedac)
Changed in charm-ceph-osd:
status: Fix Committed → Fix Released
James Page (james-page)
Changed in ceph (Ubuntu):
status: Incomplete → In Progress
milestone: none → ubuntu-18.08
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ceph - 13.2.1+dfsg1-0ubuntu3

---------------
ceph (13.2.1+dfsg1-0ubuntu3) disco; urgency=medium

  * No-change rebuild for python3.7 as the default python3.

 -- Matthias Klose <email address hidden> Tue, 30 Oct 2018 19:25:09 +0100

Changed in ceph (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.