Bug #487744 “Scheduled fsck during boot unresponsive and inactiv...” : Bugs : mountall package : Ubuntu

Revision history for this message

ais523 (ais523) wrote on 2009-11-24:

#1

Dependencies.txt Edit (829 bytes, text/plain; charset="utf-8")

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-12-02:

#2

I actually don't think you have an fsck at all, which would explain why Escape isn't working

affects:

e2fsprogs (Ubuntu) → mountall (Ubuntu)

Revision history for this message

ais523 (ais523) wrote on 2009-12-02:

#3

> I actually don't think you have an fsck at all, which would explain why
> Escape isn't working

This can't possibly be the case:
- Escape works just fine before the fsck reaches 89%
- fsck is showing a progress bar; if fsck isn't there, then how would
  the progress bar appear?
- "Filesystem checks are in progress (ESC to cancel):" strongly implies
  that something is trying to run fsck, and if it wasn't there I'd expect it
  to error out immediately
- "which fsck" and "which e2fsck" return /sbin/fsck and /sbin/e2fsck, as
  expected
- the bug reporting tool would have noticed if I tried to report a bug in
  a package I don't have installed

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-12-04: Re: [Bug 487744] Re: Scheduled fsck during boot hangs at 90%, preventing boot sequence completing

#4

On Wed, 2009-12-02 at 17:02 +0000, ais523 wrote:

> > I actually don't think you have an fsck at all, which would explain why
> > Escape isn't working
>
> This can't possibly be the case:
> - Escape works just fine before the fsck reaches 89%
>
Then stops working. Which is precisely my point.

> - fsck is showing a progress bar; if fsck isn't there, then how would
> the progress bar appear?
>
Actually fsck doesn't show the progress bar, something else is. If fsck
went away, and that something else hadn't noticed, the progress bar
would be still there ... and Escape wouldn't work.

> - "Filesystem checks are in progress (ESC to cancel):" strongly implies
> that something is trying to run fsck, and if it wasn't there I'd expect it
> to error out immediately
>
See above.

> - "which fsck" and "which e2fsck" return /sbin/fsck and /sbin/e2fsck, as
> expected
>
That just means they're installed. I meant that fsck stopped running at
90% (or got to 100% quicker than the progress bar noticed).

Scott
--
Scott James Remnant
<email address hidden>

Revision history for this message

ais523 (ais523) wrote on 2009-12-04: Re: Scheduled fsck during boot hangs at 90%, preventing boot sequence completing

#5

> > - fsck is showing a progress bar; if fsck isn't there, then how would
> > the progress bar appear?
> >
> Actually fsck doesn't show the progress bar, something else is. If fsck
> went away, and that something else hadn't noticed, the progress bar
> would be still there ... and Escape wouldn't work.

I assumed the progress bar was some variant of fsck -C; in previous versions of Ubuntu, the progress bar showed a lot of detail about what the fsck was doing, although nowadays it's a simple percentage. fsck unexpectedly exiting at 90% would explain about half the symptoms I'm getting, though (although it wouldn't directly explain why the system would lock up rather than continue thereafter).

Revision history for this message

ais523 (ais523) wrote on 2009-12-08:

#6

This bug seems to have stopped occuring now (at least, I can no longer reproduce it...) Marking as invalid for the time being, I'll reopen the bug if it happens again. (It was happening repeatably earlier, though, so it seems not to be intermittent; probably it's triggering on some unknown cause.)

Changed in mountall (Ubuntu):
status:	New → Invalid

Revision history for this message

ais523 (ais523) wrote on 2010-02-27:

#7

Download full text (3.6 KiB)

Finally figured out what's going on here! I'm now on a different computer, and getting the same bug again; but it behaves slightly differently here. This computer has an ext4 filesystem originally created by Ubuntu Karmic, and has a rather smaller main partition (relevant to how I noticed what was going on); but the same thing's happening. (And yes, I'm now suspicious that the blame is mostly mountall's.)

What happens here if I don't press ESC to abort a scheduled fsck is exactly the same; it appears to proceed as normal until 89% (which goes very quickly on this computer, thus making it easier to test), the hard drive activity light goes off, the fsck progress goes to 90% after several minutes, and the system apparently completely hangs thereafter. However, after waiting for another half-hour or so, during which the system is apparently completely hung, the progress bar goes to 91%, and picks up thereafter, with the boot finally completed.

Pressing ESC works differently on this machine, though; instead of aborting the fsck, it switches to tty1, gives an error message ("General error mounting filesystems."), and drops me to a root prompt, with the advice that control-D should retry. After pressing control-D, the fsck resumes not at 0%, but at whatever percentage it was at when I press ESC; it's as if the fsck was not stopped at all, but only suspended (which, if true, would at least explain why the filesystems failed to mount).

As a test, I tried interrupting a scheduled fsck with ESC, restarting it with control-D, interrupting the same FSCK again, restarting it again with control-D, and finally waiting until the fsck completed (including the huge hang at 89-91% for no apparent reason and with no hard drive activity). This is a dump of tty1 obtained after the boot sequence ended (obtained via /dev/vcs1):
{{{
General error mounting filesystems.
A maintenance shell will now be started.
CONTROL-D will terminate this shell and re-try.
root@desert:~# exit
mountall start/starting
fsck from util-linux-ng 2.16
swapon: /dev/disk/by-uuid/93f82bd9-13ba-48d0-b6e0-d56326ae15ea: swapon failed: Device or resource busy
mountall: swapon /dev/disk/by-uuid/93f82bd9-13ba-48d0-b6e0-d56326ae15ea [1003] terminated with status 255
mountall: Problem activating swap: /dev/disk/by-uuid/93f82bd9-13ba-48d0-b6e0-d56326ae15ea
/dev/sda5 has been mounted 25 times without being checked, check forced.
Filesystem checks are in progress (ESC to cancel):
[#######-----------------------------------------------------]
mountall: Cancelled
/dev/sda5: e2fsck canceled.
fsck.ext4: Inode bitmap not loaded while setting block group checksum info
mountall: fsck / [1001] terminated with status 8
mountall: General fsck error
init: mountall main process (1000) terminated with status 1
General error mounting filesystems.
A maintenance shell will now be started.
CONTROL-D will terminate this shell and re-try.
root@desert:~# exit
mountall start/starting
fsck from util-linux-ng 2.16
swapon: /dev/disk/by-uuid/93f82bd9-13ba-48d0-b6e0-d56326ae15ea: swapon failed: Device or resource busy
mountall: swapon /dev/disk/by-uuid/93f82bd9-13ba-48d0-b6e0-d56326ae15ea [1042]...

Finally figured out what's going on here! I'm now on a different computer, and getting the same bug again; but it behaves slightly differently here. This computer has an ext4 filesystem originally created by Ubuntu Karmic, and has a rather smaller main partition (relevant to how I noticed what was going on); but the same thing's happening. (And yes, I'm now suspicious that the blame is mostly mountall's.)

What happens here if I don't press ESC to abort a scheduled fsck is exactly the same; it appears to proceed as normal until 89% (which goes very quickly on this computer, thus making it easier to test), the hard drive activity light goes off, the fsck progress goes to 90% after several minutes, and the system apparently completely hangs thereafter. However, after waiting for another half-hour or so, during which the system is apparently completely hung, the progress bar goes to 91%, and picks up thereafter, with the boot finally completed.

Pressing ESC works differently on this machine, though; instead of aborting the fsck, it switches to tty1, gives an error message ("General error mounting filesystems."), and drops me to a root prompt, with the advice that control-D should retry. After pressing control-D, the fsck resumes not at 0%, but at whatever percentage it was at when I press ESC; it's as if the fsck was not stopped at all, but only suspended (which, if true, would at least explain why the filesystems failed to mount).

As a test, I tried interrupting a scheduled fsck with ESC, restarting it with control-D, interrupting the same FSCK again, restarting it again with control-D, and finally waiting until the fsck completed (including the huge hang at 89-91% for no apparent reason and with no hard drive activity). This is a dump of tty1 obtained after the boot sequence ended (obtained via /dev/vcs1):
{{{
General error mounting filesystems.
A maintenance shell will now be started.
CONTROL-D will terminate this shell and re-try.
root@desert:~# exit
mountall start/starting
fsck from util-linux-ng 2.16
swapon: /dev/disk/by-uuid/93f82bd9-13ba-48d0-b6e0-d56326ae15ea: swapon failed: Device or resource busy
mountall: swapon /dev/disk/by-uuid/93f82bd9-13ba-48d0-b6e0-d56326ae15ea [1003] terminated with status 255
mountall: Problem activating swap: /dev/disk/by-uuid/93f82bd9-13ba-48d0-b6e0-d56326ae15ea
/dev/sda5 has been mounted 25 times without being checked, check forced.
Filesystem checks are in progress (ESC to cancel):
[#######-----------------------------------------------------]
mountall: Cancelled
/dev/sda5: e2fsck canceled.
fsck.ext4: Inode bitmap not loaded while setting block group checksum info
mountall: fsck / [1001] terminated with status 8
mountall: General fsck error
init: mountall main process (1000) terminated with status 1
General error mounting filesystems.
A maintenance shell will now be started.
CONTROL-D will terminate this shell and re-try.
root@desert:~# exit
mountall start/starting
fsck from util-linux-ng 2.16
swapon: /dev/disk/by-uuid/93f82bd9-13ba-48d0-b6e0-d56326ae15ea: swapon failed: Device or resource busy
mountall: swapon /dev/disk/by-uuid/93f82bd9-13ba-48d0-b6e0-d56326ae15ea [1042] terminated with status 255
mountall: Problem activating swap: /dev/disk/by-uuid/93f82bd9-13ba-48d0-b6e0-d56326ae15ea
/dev/sda5 has been mounted 25 times without being checked, check forced.
Filesystem checks are in progress (ESC to cancel):
/dev/sda5: 1968337/4784128 files (0.2% non-contiguous), 13948395/19129390 blocks

Ubuntu 9.10 desert tty1

desert login:
}}}
Hopefully this is enough information to debug what's going on here.

Changed in mountall (Ubuntu):
status:	Invalid → New
summary:	- Scheduled fsck during boot hangs at 90%, preventing boot sequence - completing + Scheduled fsck during boot unresponsive and inactive for a very long + time at 90%, making the system appear to hang

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2010-03-31:

#8

The mountall bug here is that it didn't clear the message when the fsck finished (and didn't show another)

Fix pending

Changed in mountall (Ubuntu):
status:	New → Fix Committed
importance:	Undecided → Low

Revision history for this message

Launchpad Janitor (janitor) wrote on 2010-03-31:

#9

This bug was fixed in the package mountall - 2.10

---------------
mountall (2.10) lucid; urgency=low

  * Rework the Plymouth connection logic; one needs to attach the client to
    the event loop *after* connection otherwise you don't get disconnection
    notification, and one needs to actually actively disconnect in the
    disconnection handler.
  * For safety and sanity reasons it becomes much simpler to create the
    ply_boot_client when we connect, and free it on disconnection. Thus the
    presence or not of this struct tells us whether we're connected or not.
    LP: #524708.
  * Flush the plymouth connection before closing it and exiting, otherwise
    updates may be pending and the screen have messages that confuse people
    while X is starting (like fsck at 90%). LP: #487744.

  * Replace the modal plymouth prompt for error conditions with code that
    continues working in the background while prompting. This most benefits
    the old "Waiting for" message, which can now allow you to continue to
    wait and it can solve itself. LP: #527666, #545435.
  * Integrate fsck progress updates into the same mechanism.
  * Allow fsck messages to be translated. LP: #390740.
  * Change fsck message to be a little less alarming. LP: #545267.
  * Add hard dependency on Plymouth; without it running, mountall will
    ignore any filesystem which doesn't show up within a few seconds or that
    fails to fsck or mount. If you don't want graphical splash, you simply
    need not install themes.

  * Improve set of messages seen with --verbose, and ensure all visible
    messages are marked for translation. LP: #446592.
  * Reduce priority of failed to mount error for remote filesystems since
    we try again, and this just spams the console. LP: #504224.

  * Keep hold of the dev_t when parsing /proc/self/mountinfo, then after
    mounting /dev (or seeing that it's mounted) create a quick udev rules
    file that adds the /dev/root symlink to this device. LP: #527216.
  * Do not try and update /etc/mtab when it's a symbolic link. LP: #529993.
  * Remove odd -a option from mount calls, probably a C&P error from the
    fsck code long ago. LP: #537135.
  * Wait for Upstart to acknowledge receipt of events, even if we don't
    hang around for them to be handled.
  * Always run through try_mounts() at least once. LP: #537136.
  * Don't keep mountall running if the only remaining unmounted filesystems
  *
-- Scott James Remnant <email address hidden> Wed, 31 Mar 2010 19:37:31 +0100

This bug was fixed in the package mountall - 2.10

---------------
mountall (2.10) lucid; urgency=low

* Rework the Plymouth connection logic; one needs to attach the client to
    the event loop *after* connection otherwise you don't get disconnection
    notification, and one needs to actually actively disconnect in the
    disconnection handler.
  * For safety and sanity reasons it becomes much simpler to create the
    ply_boot_client when we connect, and free it on disconnection.  Thus the
    presence or not of this struct tells us whether we're connected or not.
    LP: #524708.
  * Flush the plymouth connection before closing it and exiting, otherwise
    updates may be pending and the screen have messages that confuse people
    while X is starting (like fsck at 90%).  LP: #487744.

* Replace the modal plymouth prompt for error conditions with code that
    continues working in the background while prompting.  This most benefits
    the old "Waiting for" message, which can now allow you to continue to
    wait and it can solve itself.  LP: #527666, #545435.
  * Integrate fsck progress updates into the same mechanism.
  * Allow fsck messages to be translated.  LP: #390740.
  * Change fsck message to be a little less alarming.  LP: #545267.
  * Add hard dependency on Plymouth; without it running, mountall will
    ignore any filesystem which doesn't show up within a few seconds or that
    fails to fsck or mount.  If you don't want graphical splash, you simply
    need not install themes.

* Improve set of messages seen with --verbose, and ensure all visible
    messages are marked for translation.  LP: #446592.
  * Reduce priority of failed to mount error for remote filesystems since
    we try again, and this just spams the console.  LP: #504224.

* Keep hold of the dev_t when parsing /proc/self/mountinfo, then after
    mounting /dev (or seeing that it's mounted) create a quick udev rules
    file that adds the /dev/root symlink to this device.  LP: #527216.
  * Do not try and update /etc/mtab when it's a symbolic link.  LP: #529993.
  * Remove odd -a option from mount calls, probably a C&P error from the
    fsck code long ago.  LP: #537135.
  * Wait for Upstart to acknowledge receipt of events, even if we don't
    hang around for them to be handled.
  * Always run through try_mounts() at least once.  LP: #537136.
  * Don't keep mountall running if the only remaining unmounted filesystems
  *
 -- Scott James Remnant <scott@ubuntu.com>   Wed, 31 Mar 2010 19:37:31 +0100

Changed in mountall (Ubuntu):
status:	Fix Committed → Fix Released

Revision history for this message

D J Eddyshaw (david-eddyshaw) wrote on 2010-04-05:

#10

Sometime over the past few days a problem very like this has newly arisen on my system

https://bugs.launchpad.net/ubuntu/+source/sysvinit/+bug/554079

Could this be a further problem with mountall?

Revision history for this message

Pjotr12345 (computertip) wrote on 2010-04-14:

#11

The bug is apparently not fixed.... On a fully updated Lucid, my computer hangs during booting because of fsck, which stops at 71 %. Only a hard reboot helps.

Changed in mountall (Ubuntu):
status:	Fix Released → Confirmed

Revision history for this message

Steve Langasek (vorlon) wrote on 2010-04-14:

#12

This bug was fixed, but there are other bugs that will be tracked in other bug reports.

Changed in mountall (Ubuntu):
status:	Confirmed → Fix Released

Revision history for this message

Paul Pascal (ppascal) wrote on 2010-05-04:

#13

I was upgrading 9.10 to 10.04 on my Dell Ubuntu XPS 410s. Am having a similar issue. It appears that the upgrade was either interrupted or hung somewhere in between. Screen was black and not responding to anything after I came back from work. A hard reboot got me to the following:

"mount: mounting none on /dev failed: No such device"
"chroot: cannot execute /etc/apparmor/initramfs: No such file or directory"

Then, after a couple of minutes:

a bunch of udevd[2815]: SYSFS{} messages saying: "will be removed in a future udev version, please use ATTR{} to match the event device, or ATTRS{}= to match a parent device, in /etc/udev/rules.d/65-libmtp.rules:87"

Then, this intimidating message: "udevd[2815]: specified user 'usbmux' unknown"

And this final line: "os_part has been mounted 27 times without being checked, check forced."

Whatever that process is, it hangs interminably.

Please help with simple, step-by-step solution.

Thanks so much!

Revision history for this message

chris_c (c-camacho) wrote on 2010-05-12:

#14

I do not believe this bug to be fixed as I am also still seeing exactly this behaviour

Changed in mountall (Ubuntu):
status:	Fix Released → New

Revision history for this message

Michael (michaeljt) wrote on 2010-05-12:

#15

I just saw something similar - a scheduled fsck which proceeded nice and fast up to 70% and then very slowly, with no disk activity and apparently a lot of CPU activity (fan going, laptop warming up). I rebooted at 83% having got tired of waiting, and no fsck was scheduled on the next boot. Pressing "C to cancel" did not work. This is a machine that was originally (re-)installed as Karmic with ext4 and upgraded to Lucid one day before the first beta.

Not surprisingly there was no information about that particular boot in the syslog. Please let me know if I can provide any useful information though.

Revision history for this message

Patrick Den (pat31) wrote on 2010-05-12:

#16

I have exactly the same as Michael.
This happens on three completely different computers, with different graphics cards (nvidia, intel, ati).
It does not happen a single time, but every time a scheduled check is run.
But I let it run for more than half an hour, until it finally reaches 100% and continues normally.

Revision history for this message

Takkat (takkat-nebuk) wrote on 2010-05-13:

#17

Identical issue as Michael and Patrick on 2 different machines here: a netbook running a fresh install of Lucid UNR, and a desktop system with upgrade to Lucid LTS from Karmic.

Revision history for this message

Takkat (takkat-nebuk) wrote on 2010-05-13:

#18

Ah, I see: this is most likely Bug #571707

Revision history for this message

Michael (michaeljt) wrote on 2010-05-14:

#19

chris_c: does bug #571707 look like what you are seeing? (And does https://launchpad.net/ubuntu/lucid/+source/mountall/2.15 fix the problem for you?) If so, you might want to set the status of this bug back to "Fix released".

Revision history for this message

Patrick Den (pat31) wrote on 2010-05-14:

#20

It's worse. Now it says: "severe error", but there is nothing wrong with my partition or harddisk.
In Synaptic, the 'mountall' is version 2.15. So I assume that this new version has been updated a few days ago.

Revision history for this message

Martin Erik Werner (arand) wrote on 2010-05-24:

#21

@Patrick Den:
If the "severe error" message is only seen when you cancel the disk check, this is most likely Bug #582035

Revision history for this message

Patrick Den (pat31) wrote on 2010-05-26:

#22

@arend, you are right. The scheduled fsck is now working just fine.

Anders (eddiedog988) on 2014-03-13

Changed in mountall (Ubuntu):
status:	New → Confirmed

Ubuntu
mountall package

Scheduled fsck during boot unresponsive and inactive for a very long time at 90%, making the system appear to hang

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntumountall package

Scheduled fsck during boot unresponsive and inactive for a very long time at 90%, making the system appear to hang

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
mountall package