bcache mounts inconsistent after node reboots

Bug #1676991 reported by Ryan Beisner
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MAAS
Invalid
Undecided
Unassigned
OpenStack Charm Test Infra
Fix Released
Undecided
Unassigned
curtin
Fix Released
High
Unassigned

Bug Description

On MAAS-deployed nodes with bcache-fronted disks, the bcache device names and mount points can move about after reboots, causing systems to not boot, and/or cause system outage and/or risk of data loss.

Perhaps a disk ID or label should be used to ensure the intended device is brought up on the intended mount point?

ubuntu@mutus:/var/lib/lxd⟫ cat /etc/fstab
# The following two lines are the original values as-deployed by maas
#/dev/bcache1 / ext4 defaults 0 0
#/dev/bcache0 /srv ext4 defaults 0 0
#
# After a reboot, the root and srv partitions inverted. I had to edit
# fstab for things to work again. The following two lines became necessary:
/dev/bcache0 / ext4 defaults 0 0
/dev/bcache1 /srv ext4 defaults 0 0
#
UUID=3793a969-b311-4e8b-8332-06513407aeed /boot ext4 defaults 0 0
UUID=186E-7F1F /boot/efi vfat defaults 0 0
/swap.img none swap sw 0 0

Related branches

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Hi Bryan,

Since it is curtin the one that writes out the storage configuration, I believe this is a curtin related issue. As such, opening a curtin task for this. That said, could you please provide the following:

1. What's the MAAS and Curtin version?
dpkg -l | grep maas
dpkg -l | grep curtin

2. Get the storage config:

maas <user> machine get-curtin-config <system_id_of_machine>

Changed in maas:
status: New → Incomplete
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Actually, after a short chat with Blake, we believe this is a curtin bug. The information above is still required though.

Changed in maas:
status: Incomplete → Invalid
Revision history for this message
Ryan Beisner (1chb1n) wrote :

FWIW, the particular host which was affected was originally deployed from our long-running MAAS 1.9.x lab, but will be redeployed in our new MAAS 2 lab next week.

See the attached, containing the requested dpkg info and curtin details.

Thank you.

Revision history for this message
Ryan Beisner (1chb1n) wrote :
Revision history for this message
Ryan Harper (raharper) wrote :

Can we get the install log from the initial deployment?

If possible, the boot failure serial console?

None-the-less, we certainly can mount bcache devices via UUID;

Changed in curtin:
importance: Undecided → High
status: New → Incomplete
Revision history for this message
Ryan Beisner (1chb1n) wrote :

Pasting for tracking of serverstack redeploy. We'll need to go edit the fstabs to ensure things survive reboots in a predictable order:

http://pastebin.ubuntu.com/24334043/

Revision history for this message
Ryan Beisner (1chb1n) wrote :

For long-running nodes (ie. serverstack), we've worked around this by ensuring the UUID is used in fstab.

Changed in charm-test-infra:
status: New → Confirmed
tags: added: canonical-bootstack cpec
Revision history for this message
Nobuto Murata (nobuto) wrote :

At least with the combination of MAAS and curtin below, /etc/fstab with bcache was written with UUID. So marking curtin as Fix Released. Looks like lp:curtin of revision 484 is included in SRU of xenial (xenial has revno 505 now).

$ apt policy maas python3-curtin
maas:
  Installed: 2.2.1-6078-g2a6d96e-0ubuntu1~16.04.1
  Candidate: 2.2.1-6078-g2a6d96e-0ubuntu1~16.04.1
  Version table:
 *** 2.2.1-6078-g2a6d96e-0ubuntu1~16.04.1 500
        500 http://ppa.launchpad.net/maas/stable/ubuntu xenial/main amd64 Packages
        500 http://ppa.launchpad.net/maas/stable/ubuntu xenial/main i386 Packages
        100 /var/lib/dpkg/status
     2.2.0+bzr6054-0ubuntu2~16.04.1 500
        500 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
        500 http://archive.ubuntu.com/ubuntu xenial-updates/main i386 Packages
     2.0.0~beta3+bzr4941-0ubuntu1 500
        500 http://archive.ubuntu.com/ubuntu xenial/main amd64 Packages
        500 http://archive.ubuntu.com/ubuntu xenial/main i386 Packages
python3-curtin:
  Installed: 0.1.0~bzr505-0ubuntu1~16.04.1
  Candidate: 0.1.0~bzr505-0ubuntu1~16.04.1
  Version table:
 *** 0.1.0~bzr505-0ubuntu1~16.04.1 500
        500 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
        500 http://archive.ubuntu.com/ubuntu xenial-updates/main i386 Packages
        100 /var/lib/dpkg/status
     0.1.0~bzr482-0ubuntu1~16.04.1 500
        500 http://ppa.launchpad.net/maas/stable/ubuntu xenial/main amd64 Packages
        500 http://ppa.launchpad.net/maas/stable/ubuntu xenial/main i386 Packages
     0.1.0~bzr365-0ubuntu1 500
        500 http://archive.ubuntu.com/ubuntu xenial/main amd64 Packages
        500 http://archive.ubuntu.com/ubuntu xenial/main i386 Packages

FWIW, bcache layout was specified by the following command:

$ maas "$PROFILE" machine set-storage-layout "$system_id" \
        storage_layout=bcache \
        cache_no_part=true \
        cache_mode=writeback

Changed in curtin:
status: Incomplete → Fix Released
Ante Karamatić (ivoks)
tags: added: cpe-onsite
removed: cpec
Ryan Beisner (1chb1n)
Changed in charm-test-infra:
status: Confirmed → Fix Released
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Just to clarify for whoever is going to read this.

Fix-released here does not mean that you will get consistent /dev/bcache<n> with what was configured in MAAS.

E.g. /dev/bcache0 seen in MAAS and in generated curtin file may well end up being, e.g. /dev/bcache3, on a booted system as enumeration is not guaranteed.

https://wiki.ubuntu.com/ServerTeam/Bcache
"Warning! If you have multiple bcache devices, you NEED to use UUID, enumeration of bcache devices is not guaranteed."

/dev/disk/by-dname/bcache<i> symlinks would supposedly provide you with a solution but not really.

View 1 (curtin) - bcache0 is as we configured:

http://paste.ubuntu.com/25774666/
  - backing_device: sda-part3
    cache_device: nvme0n1
    cache_mode: writeback
    id: bcache0
    name: bcache0
    type: bcache

View 2 (by-dname) - bcache0 is bcache4:

$ tree /dev/disk/by-dname/
/dev/disk/by-dname/
├── bcache0 -> ../../bcache4 # should be bcache3 to consistently use
├── bcache1 -> ../../bcache0
├── bcache2 -> ../../bcache1
├── bcache3 -> ../../bcache2
├── bcache4 -> ../../bcache3
├── sda -> ../../sda
├── sda-part1 -> ../../sda1
├── sda-part2 -> ../../sda2
└── sda-part3 -> ../../sda3

0 directories, 9 files

View 3 - bcache0 is bcache3 (actual enumeration result):

$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 3.7T 0 disk
├─sda1 8:1 0 476M 0 part /boot/efi
├─sda2 8:2 0 1.9G 0 part /boot
└─sda3 8:3 0 3.7T 0 part
  └─bcache3 250:3 0 3.7T 0 disk /
sdb 8:16 0 3.7T 0 disk
└─bcache2 250:2 0 3.7T 0 disk
sdc 8:32 0 3.7T 0 disk
└─bcache1 250:1 0 3.7T 0 disk
sdd 8:48 0 3.7T 0 disk
└─bcache0 250:0 0 3.7T 0 disk
sde 8:64 0 3.7T 0 disk
└─bcache4 250:4 0 3.7T 0 disk
sr0 11:0 1 1024M 0 rom
loop0 7:0 0 83M 1 loop /snap/core/3017
loop1 7:1 0 4.4M 1 loop /snap/canonical-livepatch/26
nvme0n1 259:0 0 1.8T 0 disk
├─bcache0 250:0 0 3.7T 0 disk
├─bcache1 250:1 0 3.7T 0 disk
├─bcache2 250:2 0 3.7T 0 disk
├─bcache3 250:3 0 3.7T 0 disk /
└─bcache4 250:4 0 3.7T 0 disk

The only thing you can address reliably is UUID of a file system on a given device because you don't use a device name in /etc/fstab.

This limits what can be done for applications that need reliable raw block device (ceph) names as a file system has to be pre-created triggering usage of a "directory mode" by ceph where both "data device" and "journal device" in case of filestore become files on a file system you pre-create affecting performance.

We need to do something about it if we are to use bcache for bluestore which requires raw block devices to get certain performance characteristics - the whole idea was to remove a file system from the equation and with that randomness we have to specify it beforehand.

Revision history for this message
Ryan Harper (raharper) wrote :

@dmitriis

It appears that we have an addition bug in the dname of bcache devices.

I've filed a bug:

https://bugs.launchpad.net/curtin/+bug/1728742

To track fixing dname links for bcache devices.

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote : Re: [Bug 1676991] Re: bcache mounts inconsistent after node reboots

Ryan,

Thanks, I've commented on it with the information I dug up.

We need a reliable way to identify a given bcache device even without a
file system or GPT present on it. I think this will require either kernel
modifications or very clever udev rules to make sure Juju storage usable
(or, in other words that MAAS tags will refer to bcache device names
actually used on a target system).

On 30 Oct 2017 7:01 pm, "Ryan Harper" <email address hidden> wrote:

@dmitriis

It appears that we have an addition bug in the dname of bcache devices.

I've filed a bug:

https://bugs.launchpad.net/curtin/+bug/1728742

To track fixing dname links for bcache devices.

--
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/1676991

Title:
  bcache mounts inconsistent after node reboots

Status in OpenStack Charm Test Infra:
  Fix Released
Status in curtin:
  Fix Released
Status in MAAS:
  Invalid

Bug description:
  On MAAS-deployed nodes with bcache-fronted disks, the bcache device
  names and mount points can move about after reboots, causing systems
  to not boot, and/or cause system outage and/or risk of data loss.

  Perhaps a disk ID or label should be used to ensure the intended
  device is brought up on the intended mount point?

  ubuntu@mutus:/var/lib/lxd⟫ cat /etc/fstab
  # The following two lines are the original values as-deployed by maas
  #/dev/bcache1 / ext4 defaults 0 0
  #/dev/bcache0 /srv ext4 defaults 0 0
  #
  # After a reboot, the root and srv partitions inverted. I had to edit
  # fstab for things to work again. The following two lines became
necessary:
  /dev/bcache0 / ext4 defaults 0 0
  /dev/bcache1 /srv ext4 defaults 0 0
  #
  UUID=3793a969-b311-4e8b-8332-06513407aeed /boot ext4 defaults 0 0
  UUID=186E-7F1F /boot/efi vfat defaults 0 0
  /swap.img none swap sw 0 0

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-test-infra/+bug/1676991/+subscriptions

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.