Merge ~crass/casper:fix-unsafe-mount-iso-loopback into casper:main

Proposed by Glenn Washburn
Status: Needs review
Proposed branch: ~crass/casper:fix-unsafe-mount-iso-loopback
Merge into: casper:main
Diff against target: 43 lines (+18/-1)
2 files modified
scripts/casper (+2/-0)
scripts/casper-helpers (+16/-1)
Reviewer Review Type Date Requested Status
Dimitri John Ledkov Needs Information
Steve Langasek Needs Information
Canonical Hardware Enablement hwe Pending
Review via email: mp+453083@code.launchpad.net

Description of the change

This fixes a very old issue where booting the iso via loopback can mount hibernated filesystems which can lead to data loss and corruption of the hibernation image. A kernel parameter, "unsafe-mount", is added which allows the old behavior. This can be tested by the following steps:
 1. Store the livecd iso on a journalled, say ext4, filesystem
 2. hibernate with that filesystem mounted
 3. boot the iso via loopback
 4. reboot into hibernated system and see if resume succeeds

To post a comment you must log in.
Revision history for this message
Steve Langasek (vorlon) wrote :

Thanks for this proposal. As this is a substantive change, I would like to see a bug opened against https://bugs.launchpad.net/ubuntu/+source/casper before accepting it, so that we ensure we have a way to track discussion/feedback from users in the future.

A few thoughts:
- Ubuntu does not support hibernation anymore, for various reasons; TL;DR: the usability of it is sufficiently poor, and the power savings advantage sufficiently limited versus suspend on modern systems, that it was not worth carrying on. I don't know what other Linux distributions do with respect to hibernation support these days, but if the situation is similar, then I think this is an uncommon use case.
- This patch changes the default behavior of casper when loading images from nearly all supported writable filesystems. I'm not sure we should change the default. It is *safer*, but it reduces usability for most of our users (again, a hibernated Ubuntu filesystem is uncommon; booting a livefs on a system that's hibernated is even more uncommon). This is partly why I think we should have a bug report attached to this discussion.
- The mount manpage mentions using '-o ro,noload' with ext3 and ext4 filesystems to let it be mounted without the journal being replayed. Wouldn't this be a robust solution that doesn't require refusing to load from ext3/ext4 filesystems by default?
- This is only an issue when the user boots a casper-based system *specifically pointing it* at an image which is located on a journalling filesystem that has a dirty journal because it was mounted by a Linux system that has been hibernated and will be confused by a journal replay happening behind its back. So uh seriously, who does that and why?

Revision history for this message
Steve Langasek (vorlon) wrote :

Oh, sorry, I just noticed that this is linked to LP: #1117665. Will take a look there.

Revision history for this message
Steve Langasek (vorlon) wrote :

Ok looking at that bug report, I see that it says:

> When a hibernated windows partition exists the ntfs3g mounter will refuse to mount the partition in
> read/write mode and returns an error. This will cause the search script to die and the boot process to
> stop, bringing the system to a debugging shell in the initrd.

But it's straightforward to just mount the filesystem read-only instead, and you're using a much bigger hammer here to disallow even ATTEMPTING to mount the filesystem by default. And the net result is not a better user experience for the user vs what is described in that bug report, because you say the ntfs3g filesystem fails to mount and thus blocks the boot - which is the same result as when casper refuses to try to mount it at all.

I see that you point out that casper actually scans filesystems to find the image, so it's not a matter of having wrongly told casper to look at a SPECIFIC device that has a dirty journal. Nevertheless, the bug report does not support the changes proposed here.

Revision history for this message
Steve Langasek (vorlon) :
review: Needs Information
Revision history for this message
Glenn Washburn (crass) wrote :

> Thanks for this proposal. As this is a substantive change, I would like to
> see a bug opened against https://bugs.launchpad.net/ubuntu/+source/casper
> before accepting it, so that we ensure we have a way to track
> discussion/feedback from users in the future.
>
> A few thoughts:
> - Ubuntu does not support hibernation anymore, for various reasons; TL;DR: the
> usability of it is sufficiently poor, and the power savings advantage
> sufficiently limited versus suspend on modern systems, that it was not worth
> carrying on. I don't know what other Linux distributions do with respect to
> hibernation support these days, but if the situation is similar, then I think
> this is an uncommon use case.

Yes, I imagine it is. And I shouldn't need to point out that users with uncommon setups are no less deserving of data loss protection.

> - This patch changes the default behavior of casper when loading images from
> nearly all supported writable filesystems. I'm not sure we should change the
> default. It is *safer*, but it reduces usability for most of our users
> (again, a hibernated Ubuntu filesystem is uncommon; booting a livefs on a
> system that's hibernated is even more uncommon). This is partly why I think
> we should have a bug report attached to this discussion.

This is reasonable. So you want a bug report that it explicit about the data loss scenario, correct?

> - The mount manpage mentions using '-o ro,noload' with ext3 and ext4
> filesystems to let it be mounted without the journal being replayed. Wouldn't
> this be a robust solution that doesn't require refusing to load from ext3/ext4
> filesystems by default?

For these filesystems, it might be a robust solution. Personally, I would feel uneasy about it. I'd rather a better guarantee that nothing gets written to the filesystem. For instance, that may still write to the superblock and I don't know if that will corrupt the resume process. I also don't like this because its not general. What about XFS and BTRFS?

> - This is only an issue when the user boots a casper-based system
> *specifically pointing it* at an image which is located on a journalling
> filesystem that has a dirty journal because it was mounted by a Linux system
> that has been hibernated and will be confused by a journal replay happening
> behind its back. So uh seriously, who does that and why?

I think you addressed this in a later comment.

Revision history for this message
Glenn Washburn (crass) wrote :
Download full text (6.1 KiB)

> Ok looking at that bug report, I see that it says:
>
> > When a hibernated windows partition exists the ntfs3g mounter will refuse to
> mount the partition in
> > read/write mode and returns an error. This will cause the search script to
> die and the boot process to
> > stop, bringing the system to a debugging shell in the initrd.
>
> But it's straightforward to just mount the filesystem read-only instead, and

Yes, more straight forward for NTFS, but not general. Are you suggesting have special logic for NTFS?

You appear to understand that replaying a journal of on a hibernated filesystem will corrupt the resume. I'm not sure if ntfs3g replays the journal when mounting in readonly, but the kernel does for ext3/4. I believe I've seen BTRFS fail to mount on readonly block devices, so I think there's a similar problem there. And I'm unsure of XFS, but let's error on the safe side. This appears to be why the allowed filesystems for the persistent storage file are limited to FAT[1] (though it could and should be opened to more filesystems).

As you'll note, the merge description does not mention the scenario of the linked bug report, which is not a data loss scenario. That was intentional. The bug was linked only because it is also fixed by these changes. The primary intent of these changes is to fix the data loss issue.

> you're using a much bigger hammer here to disallow even ATTEMPTING to mount
> the filesystem by default.

Yes, that is intentional for safety reasons. Its more important to err on the side of not allowing ISOs on some filesystems by default, than to risk data-loss from a corrupted hibernation image.

Now, there are techniques that will allow the mounting of any filesystem read/write without writing any data to the physical block device. But that requires more complexity (and additional kernel modules) and because of that I figured it would be harder to be accepted. Another option that is simpler but I'm more wary of is setting the block device itself to readonly (blockdev --setro). Care would need to be taken that block devices can not be used before being set to readonly. And code would need to be added to remove the readonly status for the persistent storage block device (maybe others). This solution is simpler and I think strikes a good trade-off of by default more limited functionality for avoidance of data loss, with the option to go back to the old behavior if really needed.

> And the net result is not a better user experience
> for the user vs what is described in that bug report, because you say the
> ntfs3g filesystem fails to mount and thus blocks the boot - which is the same
> result as when casper refuses to try to mount it at all.

Your statement is only true _if_ the ISO is on the hibernated NTFS filesystem. Using ISOs on hibernated filesystems are not supported with this patch. As noted above, the more complex solution that allows file access on hibernated filesystems without writing to them would allow this to be supported. But this is not that important to me personally.

A more common case is that the ISO will be on a non-journalled filesystem (eg. FAT). Currently, this would still be a problem when t...

Read more...

Revision history for this message
Glenn Washburn (crass) wrote :

The newly linked bug #41624 is a good read on the issue. In comment #19, Collin Watson mentions that the bug "definitely also affects iso-scan", which is what this merge fixes.

[1] https://bugs.launchpad.net/ubuntu/+source/partman-basicfilesystems/+bug/41624/comments/19

Revision history for this message
Glenn Washburn (crass) wrote :

In bug #230703 comment #7[1], Collin Watson says in response to a comment about this data loss bug in lupin "yes, it is a real problem elsewhere, and I'm pretty sure there are bugs about it. That doesn't justify introducing the same bug here, in a much more widely used case. Far more people use the Ubuntu live CD than the other cases you mentioned, and it is extremely important that it not trash the system by default."

He goes on to say "Hibernating and booting a live CD is a much more plausible real-world scenario than either hibernating and performing a USB stick installation or hibernating and performing a Wubi installation."

And then "This change should be reverted, and a comment should be added above the code in question to indicate that it must not be modified to include any filesystem type that might be able to mount a journalled filesystem, so that we don't forget this in future."

Seems like he considered this a serious issue over 15 years ago...

[1] https://bugs.launchpad.net/ubuntu/+source/casper/+bug/230703/comments/7

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

I am not too sure, but I am suspecting this may affect or break factory provisioning and OEM recovery.

review: Needs Information
Revision history for this message
Steve Langasek (vorlon) wrote :

On Thu, Jan 18, 2024 at 04:15:02AM -0000, Glenn Washburn wrote:
> Seems like he considered this a serious issue over 15 years ago...

15 years ago, hibernation was much more common and suspending to RAM
consumed a lot more power.

15 years ago, the current behavior was not the status quo that people have
come to rely on and there was no significant risk of regression as a result
of a categorical revert.

I have asked that you open a bug report where this can be discussed. Merge
proposals are inadequate for the level of nuanced discussion required here.

Revision history for this message
Glenn Washburn (crass) wrote :

I've created and added bug #2051898. I had previously asked for clarification on what you wanted in the bug, so I hope its what you had in mind.

Unmerged commits

89fd7dc... by Glenn Washburn

Allow mounting unsafe filesystems when the unsafe-mount boot param is used

If a user really wants to use an iso located on a filesystem that is
unsafe to mount (eg. a journalled filesystem), they must specify the
"unsafe-mount" kernel boot parameter.

5e5e576... by Glenn Washburn

Do not support iso loopback boot from isos on journalled filesystems

As noted else where in scripts/casper, mounting a filesystem in readonly
does not prevent the journal log from being replayed. This will cause
hibernation corruption and potential dataloss on those filesystems. So
only allow searching for the loopback iso on non-journalled filesystems.

LP: #1117665

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/scripts/casper b/scripts/casper
2index 36a724d..ee978fd 100644
3--- a/scripts/casper
4+++ b/scripts/casper
5@@ -40,6 +40,8 @@ parse_cmdline() {
6 export PERSISTENT="" NOPERSISTENT="Yes" ;;
7 persistent-path=*)
8 export PERSISTENT_PATH="${x#persistent-path=}" ;;
9+ unsafe-mount)
10+ export UNSAFE_MOUNT='Yes' ;;
11 ip=*)
12 STATICIP=${x#ip=}
13 if [ "${STATICIP}" = "" ]; then
14diff --git a/scripts/casper-helpers b/scripts/casper-helpers
15index 584dac2..926bdac 100644
16--- a/scripts/casper-helpers
17+++ b/scripts/casper-helpers
18@@ -350,9 +350,24 @@ find_files()
19 is_supported_fs(){
20 [ -z "${1}" ] && return 1
21 case ${1} in
22- ext2|ext3|ext4|xfs|jfs|reiserfs|vfat|ntfs|iso9660|btrfs|udf)
23+ # Do not add any filesystem types here that might be able to
24+ # mount a journalled filesystem and replay the journal. Doing so
25+ # will cause data loss when a live CD is booted on a system
26+ # where filesystems are in use by hibernated operating systems.
27+ ext2|vfat|iso9660|udf)
28 return 0
29 ;;
30+ ext3|ext4|xfs|jfs|reiserfs|ntfs|btrfs)
31+ # These filesystems are journalled filesystem and so are not safe
32+ # to mount, which will replay the journal, when they have been
33+ # mounted on a previously hibernated system. Doing this can cause
34+ # hibernation resume to fail and thus data loss to occur. Allow
35+ # this to be overridden by the user when they know for sure the
36+ # journalled filesystem is not part of a hibernation image.
37+ if [ -n ${UNSAFE_MOUNT} ]; then
38+ return 0
39+ fi
40+ ;;
41 esac
42 return 1
43 }

Subscribers

People subscribed via source and target branches