Cannot deploy Centos7 with xfs when using Focal as commissioning image

Bug #1958433 reported by Ioanna Alifieraki
30
This bug affects 4 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Mauricio Faria de Oliveira
2.8
Won't Fix
Undecided
Unassigned
2.9
Fix Released
Medium
Unassigned
3.0
Fix Released
Medium
Mauricio Faria de Oliveira
3.1
Fix Released
Undecided
Unassigned
curtin
Invalid
Undecided
Unassigned

Bug Description

When setting Focal as commissioning image and try to deploy Centos7 with xfs the machine keeps rebooting.

The reboots happen because the root partition is mounted as read-only and cannot be re-mounted as rw :
[ 1.627587] XFS (sda2): Superblock has unknown read-only compatible features (0x4) enabled.
[ 3.115579] XFS (sda2): ro->rw transition prohibited on unknown (0x4) ro-compat filesystem

If use Bionic as commissioning image Centos7+xfs is deployed without problems.

The root cause of this is xfs being incompatible between older and newer versions.
To confirm this I passed "reflink=0" to mkfs.xfs command with following hack :
diff --git a/curtin/block/mkfs.py b/curtin/block/mkfs.py
index ea5f09dd..abfefdb1 100644
--- a/curtin/block/mkfs.py
+++ b/curtin/block/mkfs.py
@@ -84,7 +84,7 @@ family_flag_mappings = {
              "ext": ("-U", "{uuid}"),
              "reiserfs": ("--uuid", "{uuid}"),
              "swap": ("--uuid", "{uuid}"),
- "xfs": ("-m", "uuid={uuid}")},
+ "xfs": ("-m", "uuid={uuid},reflink=0")},
 }

 release_flag_mapping_overrides = {

With this hack I was able to deploy Centos7+xfs using focal as commissioning image.
However this quick hack is only to confirm the root cause and not a good fix.
I open this bug to investigate a more appropriate solution.

Tags: sts

Related branches

Changed in maas:
status: New → Confirmed
Changed in curtin:
status: New → Confirmed
Changed in maas:
status: Confirmed → New
Changed in curtin:
status: Confirmed → Incomplete
status: Incomplete → New
Alberto Donato (ack)
Changed in maas:
status: New → Triaged
importance: Undecided → High
milestone: none → next
Revision history for this message
ubu one (ubu1244one) wrote :

A solution could also be to use a higher kernel version.

You can download the CentOS 7 cloud image, edit it and add kernel-lt (or kernel-ml) to it from elrepo-kernel (https://elrepo.org/tiki/HomePage).

After that, during deployment you will have to regenerate the initramfs with dracut and uninstall the "normal" kernel package (then it will use kernel-lt) using curtin commands. This is necessary because MaaS is only set up to regenerate the initramfs of the "kernel" package.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

For a workaround specific to CentOS 7 with curtin userdata (install template),
and additional info for root cause between Bionic/Focal commissioning images,
see bug 1965587 comments 12-14.

Revision history for this message
Derek DeMoss (derek-omnivector) wrote :

@ubu1244one, if you mean we could use a custom CentOS image, unfortunately that won't work with the normal workflow since for some reason Juju can't trigger a custom image deployment in MAAS. See: https://bugs.launchpad.net/juju/+bug/1968234

I'm planning to test @mfo's workaround in a bit today :)

Changed in curtin:
status: New → Invalid
Changed in maas:
status: Triaged → Confirmed
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

I've been looking into this, coming from bug 1965587.

IMHO, the issue is in MAAS not Curtin:

- Curtin provides a implementation for the storage config,
which has 'extra_options' for 'mkfs'; MAAS doesn't set it.

- And it's MAAS that has sufficient information to determine that
(filesystem type, target os/release, and provisioning os/release;
curtin only knows the first).

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

So, it seems there are 2 approaches to address this at MAAS:

1) MAAS server detects XFS on CentOS 7, and sets 'extra_options' for 'mkfs' with '-m reflink=0'.
2) Curtin Userdata install template for CentOS checks for CentOS 7 and creates 'mkfs.xfs' wrapper.

Attaching patches for both options. Comments are welcome!

Option 1 wasn't tested.
Option 2 tested/works!

...

The key difference, in favor of option 2, is that we can check
whether 'mkfs.xfs' actually supports the '-m reflink=0|1' option.

It shouldn't be an issue as it's been introduced in xfsprogs upstream
in 2016-10 and downstream in Ubuntu Artful (the earliest deployment
image must be Bionic, right?)

But if for whatever reason an user (can?) choose another deployment
Linux distro (or older?), with 'mkfs.xfs' that doesn't support the
'-m reflink=0|1', it means we cannot use -- it would fail.
... And actually don't even have to, as then reflink won't be used.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :
Revision history for this message
Alexsander de Souza (alexsander-souza) wrote :

I personally like the option #2 more, because it doesn't introduce another special case in MAAS code and also can be used by other distros with the same issue.

Do you want to put up a Merge Proposal with this? I can help you with the process if you need.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hey Alexsander,

Thanks for your feedback!

Sure, I'll send a MP next week and ping you in case I need help.

Changed in maas:
status: Confirmed → In Progress
assignee: nobody → Mauricio Faria de Oliveira (mfo)
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi Alexsander,

Please see MP [1] with the workaround proposal.

I've changed it a bit so it works on Bionic comissioning images too
(paths/args to echo and mkfs.xfs).

Tested to work fine on Bionic/Focal/Jammy commissioning images
to deploy CentOS 7 with a XFS partition.

[1] https://code.launchpad.net/~mfo/maas/+git/maas-1/+merge/421617

...

However,

I'm afraid that in the SNAP version such changes aren't effective by default,
as the preseed files are shipped as '.sample' files, right?
(Please see details below).

If that is correct, even though this isn't particular to this change,
I'd guess this isn't too useful in the sense that users would still hit
this by default, and have to figure out on their own about the issue,
and rename/take the new changes from 'curtin_userdata_centos.sample' ?

Thanks!
Mauricio

Details:
---

For DEB, 'setup.cfg' fills '/etc/maas/preseeds' with 'contrib/preseeds_v2/curtin_userdata_centos' and others, with the same filenames.

For SNAP, 'snap/hooks/install' does 'cp $preseed ${preseed}.sample' (i.e., '.sample' suffix).

BUT in 'src/massserver/preseed.py' (the only consumer of 'PRESEED_TEMPLATE_LOCATIONS',
set in 'src/maasserver/djangosettings/{snap,settings,development}.py'),
the functions 'get_preseed_template', 'get_preseed_filenames', 'load_preseed_template'
don't check for a 'sample' suffix.

(neither the 'migrate'/'reconfigure-supervisord' calls in 'snap/hooks/install'.)

Changed in maas:
status: In Progress → Fix Committed
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

I discussed comment #10 with Alexsander today.

Indeed, there are some issues to make the fix
based on curtin userdata files effective with
SNAP builds (cannot overwrite user cfg files;
with DEB, dpkg will prompt/interact w/ user).

The other approach in the maasserver code has
other downsides.

So, I'll look for another method to check the
release and configure a mkfs.xfs wrapper that
runs always / not with curtin userdata files.
(and then revert the committed changes.)

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :
Download full text (3.4 KiB)

I take that back. :)

Alexsander mentioned something I missed earlier.

We apparently just need to update docs,
and I have to read code more carefully. :)

...

The SNAP build _HAS_ `/etc/maas/preseeds/` as well;
it's in the read-only side (`/snap`),
not in the read-write side (`/var/snap`).

If I look closer than I did in comment #10:

The SNAP build also runs `setup.cfg` per the `python` plugin [1],
and as such, populates `/etc/maas/preseeds` with same filenames.

 @ snap/snapcraft.yaml

  72 parts:
  73 maas:
  74 plugin: python

 [1] https://snapcraft.io/docs/python-plugin

And the preseed template locations for the snap configuration
file actually include _BOTH_ of them (which I read but didn't
investigate properly):

 @ src/maasserver/djangosettings/snap.py

  19 PRESEED_TEMPLATE_LOCATIONS = (
  20 os.path.join(os.environ["SNAP_DATA"], "preseeds"),
  21 os.path.join(os.environ["SNAP"], "etc", "maas", "preseeds"),
  22 )

During test, that couldn't be picked up because of a
implementation detail / design decision, it seems:

A more generic template file in the `/var/snap` dir is _preferred over_
a more specific template file in the `/snap` dir.

See the order of the 2 for-loops here:

 @ src/maasserver/preseed.py

  789 def get_preseed_template(filenames):
  790 """Get the path and content for the first template found.
  791
  792 :param filenames: An iterable of relative filenames.
  793 """
 ...
  796 for location in settings.PRESEED_TEMPLATE_LOCATIONS:
  797 for filename in filenames:
  798 filepath = os.path.join(location, filename)
  799 try:
  800 with open(filepath, encoding="utf-8") as stream:
  801 content = stream.read()
  802 except OSError:
  803 pass # Ignore.
  804 else:
  805 return filepath, content
 ...

In my test server, there was a more generic file in /var/snap:
- /var/snap/maas/current/preseeds/curtin_userdata

which prevented the centos specific file in /snap/ to be picked:
- /snap/maas/current/etc/maas/preseeds/curtin_userdata_centos

Once I removed /var/snap/.../curtin_userdata, the changes
in /snap/.../curtin_userdata_centos were effective.

...

Well, that seems to be _the right thing to do_...

Specifically because it allows users to have their own,
_more generic_ template file that works on all releases,
and ignores the release-based templates shipped in SNAP.

...

I just think we should update the documentation about it,
right now it has 2 issues:

1) It only mentions the `/var/snap/maas/current/preseeds/`
path, which might puzzle users seeing behavior specified
in the (unknown) `/snap/maas/etc/maas/preseeds/` location.

2) The paths for DEB/SNAP are incorrectly changing due to
the UI/CLI view-change, not on DEB/SNAP change.

https://maas.io/docs/about-customising-machines#heading--templates

We should fix it here: (note the 2 `[tab]` tags)

@ src/maas-offline-docs/src/src/about-customising-machines-5976.md

132 [tabs]
133 [tab version="v3.2 Snap,v3.2 Packages,v3.1 Snap,v3.1 Packages,v3.0 Snap,v3.0 Packages,v2.9 Snap,v2.9 Packages" view="UI"]
134 ... ...

Read more...

Changed in maas:
milestone: next → 3.2.0
milestone: 3.2.0 → next
Changed in maas:
milestone: next → 3.2.0
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :
Download full text (8.6 KiB)

Submitted MRs with (clean) backports for the stable branches (3.1, 3.0, 2.9).

All tested successfully with MAAS SNAP 3.1/3.0/2.9.

Test Steps:
---

Scenario:
- 1 maas VM
- 1 test VM (boot from network/hd)
- 1 bridge with both VMs (no DHCP)

1) Deploy MAAS 3.1/3.0/2.9 SNAP from stable channel:

 $ sudo snap install --channel=3.1/stable maas
 $ sudo snap install --channel=3.1/stable maas-test-db
 $ sudo maas init region+rack --database-uri maas-test-db:///
 $ sudo maas createadmin --username admin --password admin --email <email address hidden> --ssh-import lp:mfo

2) Configure MAAS server in the Web UI
- Images: Ubuntu 18.04/20.04/22.04 and CentOS 7/8.
- Subnet: Enable DHCP in subnet/vlan w/ bridge to test VM.

3) Setup test VM in MAAS
- Boot test VM (enlist in MAAS)
- Edit test VM (set power: manual)
- Provision it (start it manually)
- Configure it (add XFS partition/mount)

4) Setup deployment tests (stop/start VM as needed)
- Configure the commissioning image to Ubuntu 20.04 (or other versions later)
- Deploy CentOS 7 (or 8 later)

5) Tests and results:
[commisioning image / deployed OS / original or patched curtin_userdata_centos]

- 20.04 / CentOS 7 / original: FAIL (problem reproduces; see [1])
- 20.04 / CentOS 7 / modified: PASS (problem fixed; see [2]
- 20.04 / CentOS 8 / modified: PASS (no regression; wrapper not setup; see [3])

- 18.04 / CentOS 7 / modified: PASS (no regression; wrapper setup, reflink still off; see [4])
- 18.04 / CentOS 8 / modified: PASS (no regression; wrapper not setup)

- 22.04 / CentOS 7 / modified: PASS (problem fixed)
- 22.04 / CentOS 8 / modified: PASS (no regression; wrapper not setup)

*)

In order to test the changes, just replace the existing
'curtin_userdata_centos' file in snap's read-only side,
with the updated one, using a bind mount:

Before)

 $ diff -U0 /snap/maas/current/etc/maas/preseeds/curtin_userdata_centos ~/curtin_userdata_centos | tail -n+4
 +early_commands:
 + centos70_xfs_lp1958433: [ '/bin/sh', '-c', 'if [ "{{release}}" = "centos70" ] && mkfs.xfs 2>&1 | grep -q "reflink=0|1"; then WRAPPER=/usr/local/sbin/mkfs.xfs; echo "#!/bin/sh" >$WRAPPER && echo "exec $(which mkfs.xfs) -m reflink=0 \"\$@\"" >>$WRAPPER && chmod +x $WRAPPER && echo "Wrapper: $WRAPPER" && cat $WRAPPER; fi' ]
 +

Switch)

 $ sudo mount --bind ~/curtin_userdata_centos /snap/maas/current/etc/maas/preseeds/curtin_userdata_centos

After)

 $ diff -U0 /snap/maas/current/etc/maas/preseeds/curtin_userdata_centos ~/curtin_userdata_centos | tail -n+4
 $

To revert it back:

 $ sudo umount /snap/maas/current/etc/maas/preseeds/curtin_userdata_centos

To uninstall/move to another version:

 $ sudo snap remove --purge maas
 $ sudo snap remove --purge maas-test-db

Examples:
---

[1] 20.04 / CentOS 7 / original: FAIL

 ...
 Ubuntu 20.04.4 LTS vmaas ttyS0
 ...
 [ 78.349650] cloud-init[1430]: start: cmd-install/stage-early: preparing for installation
 [ 78.366818] cloud-init[1430]: stage_early took 0.000 seconds
 [ 78.369113] cloud-init[1430]: finish: cmd-install/stage-early: SUCCESS: preparing for installation
 ...
 ...
 [ OK ] Reached target Local File Systems (Pre).
 ...
   Mounting /xfs...
 ...
 [ 4.363480] SGI XFS with ...

Read more...

Changed in maas:
milestone: 3.2.0 → 3.2.0-beta5
status: Fix Committed → Fix Released
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

3.2: fix released in 3.2.0

 $ git log --oneline --grep 1958433 origin/master
 9eef25a55a36 LP:1958433 curtin workaround for xfs in centos70

 $ git describe --contains 9eef25a55a36
 3.2.0-beta5~60

3.1: fix released in 3.1.1

 $ git log --oneline --grep 1958433 origin/3.1
 d5a2b49e4094 [cherry-pick from commit 9eef25a55a36 ("LP:1958433 curtin workaround for xfs in centos70")]

 $ git describe --contains d5a2b49e4094
 3.1.1-rc1~13

3.0: fix committed (target 3.0.1)

 $ git log --oneline --grep 1958433 origin/3.0
 6e85109153f5 [cherry-pick from commit 9eef25a55a36 ("LP:1958433 curtin workaround for xfs in centos70")]

 $ git describe --contains 6e85109153f5
 fatal: cannot describe '6e85109153f5ad33d662d373f7795d719fc4bf6b'

 $ git describe 6e85109153f5
 3.0.0-11-g6e85109153f5

2.9: fix committed (target 2.9.3)

 $ git log --oneline --grep 1958433 origin/2.9
 d8eddd4d518f [cherry-pick from commit 9eef25a55a36 ("LP:1958433 curtin workaround for xfs in centos70")]

 $ git describe --contains d8eddd4d518f
 2.9.3-beta2test~5

2.8: pending

 $ git log --oneline --grep 1958433 origin/2.8
 $

tags: added: sts
Revision history for this message
Heitor Alves de Siqueira (halves) wrote :

MAAS 3.0.1 has now been released, and is available as below:
- deb: ppa:maas/3.0 (3.0.1-10052-g.82c730c57-0ubuntu1~20.04.1)
- snap: 3.0/stable (3.0.1-10052-g.82c730c57)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.