Ignoring a broken clock results in infinite reboots; not ignoring results in fsck failure; no solution to this problem

Bug #563618 reported by Oliver Grawert
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
e2fsprogs
Unknown
Unknown
e2fsprogs (Ubuntu)
New
Undecided
Unassigned
Lucid
Won't Fix
Undecided
Unassigned
flash-kernel (Ubuntu)
Fix Released
High
Oliver Grawert
Lucid
Fix Released
High
Oliver Grawert
initramfs-tools (Ubuntu)
Fix Released
High
Unassigned
Lucid
Fix Released
High
Unassigned
mountall (Ubuntu)
Invalid
High
Unassigned
Lucid
Invalid
High
Unassigned

Bug Description

Binary package hint: util-linux

while e2fsck has the optioon to add an e2fsck.conf with an option to ignore broken system clocks, fsck does not have the ability to override this.

that behavior together with a mountall bug which does not give the user a maintenance shell when fsck exits but makes it go into an endless loop renders many ARM platforms we support unbootable (we have several development boards that either come with an unloaded (imx51) or without battery at all (omap beagleboard)).

the only workaround in this case is to edit both fstab files on the rootfs (/etc/fstab and /lib/init/fstab) and change the pass number for the root filesystem from 1 to 0

Revision history for this message
Oliver Grawert (ogra) wrote :

bug #501801 and #562898 seem to be related mountall bugs

Revision history for this message
Oliver Grawert (ogra) wrote :

setting to high and milestoning for lucid since that renders many armel installations unbootable

Changed in mountall (Ubuntu Lucid):
importance: Undecided → High
Changed in util-linux (Ubuntu Lucid):
importance: Undecided → High
Changed in mountall (Ubuntu Lucid):
milestone: none → ubuntu-10.04
Changed in util-linux (Ubuntu Lucid):
milestone: none → ubuntu-10.04
Revision history for this message
Amit Kucheria (amitk) wrote :

Beagleboard has no RTC backup battery. Though one can be soldered on, most devices won't have any.

So we need fsck to skip the time check.

Changed in util-linux (Ubuntu Lucid):
status: New → Confirmed
Changed in mountall (Ubuntu Lucid):
status: New → Confirmed
Revision history for this message
Robert Nelson (robertcnelson) wrote :

Just adding some OMAP Board Details.

Development Boards:
Original Bx/C2 Beagles have no prevision for installing a backup battery.

C3 & C4(current shipping) Beagles have an area reserved for end users to solder a 3V coin battery. I believe the XM will have the same provision, i just don't have it next to me at the moment to look...

IGEPv2 : Revision C and greater, has an area reserved for end users to solder a 3V coin battery

End Products:
Always Innovating Touch book, doesn't have a battery location..

http://www.alwaysinnovating.com/home/index.htm
Hack: http://www.alwaysinnovating.com/wiki/index.php/I2C_RTC

This is the current e2fsck nasty hack beagle and igepv2 users are using to run ubuntu:

/etc/e2fsck.conf

[problems]

# Superblock last mount time is in the future (PR_0_FUTURE_SB_LAST_MOUNT).
0x000031 = {
    preen_ok = true
    preen_nomessage = true
}

# Superblock last write time is in the future (PR_0_FUTURE_SB_LAST_WRITE).
0x000032 = {
    preen_ok = true
    preen_nomessage = true
}

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

This is not a mountall issue, if e2fsck's broken_system_clock=true option is not working, that's an e2fsck bug

Changed in mountall (Ubuntu Lucid):
status: Confirmed → Invalid
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

Robert: that's not a "nasty hack" - that's the canon way to work around this issue - this should be identical to just doing:

[options]
broken_system_clock = true

which will set preen_ok for those two problems

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

fwiw, here's the e2fsck code for "broken_system_clock", so it really should be directly equivalent!

                if ((code == PR_0_FUTURE_SB_LAST_MOUNT) ||
                    (code == PR_0_FUTURE_SB_LAST_WRITE)) {
                        profile_get_boolean(ctx->profile, "options",
                                            "broken_system_clock", 0, 0,
                                            &broken_system_clock);
                        if (broken_system_clock)
                                ptr->flags |= PR_PREEN_OK;
                }

Changed in util-linux (Ubuntu Lucid):
assignee: nobody → Scott James Remnant (scott)
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

Having chatted to ogra, we think that broken_system_clock and that config snippet are both working fine.

The problem is that this results in a "fix" to the root filesystem, and that in turn results in the requirement that we reboot (the mounted root fs has changed!)

And rebooting clears the hardware clock again, so we do it all over again in a loop.

There isn't a fix for this :-(

Changed in util-linux (Ubuntu Lucid):
assignee: Scott James Remnant (scott) → nobody
milestone: ubuntu-10.04 → none
status: Confirmed → Triaged
summary: - there is no way to tell fsck to ignore broken clocks on embedded systems
+ Ignoring a broken clock results in infinite reboots; not ignoring
+ results in fsck failure; no solution to this problem
Revision history for this message
Dave Martin (dave-martin-arm) wrote : RE: [Bug 563618] Re: there is no way to tell fsck to ignore broken clockson embedded systems

My workaround for this kind of problem has been to store the clock value in
a file during shutdown/reboot, and initially set the time to that value plus
a small amount < reboot latency, before the root filesystem is checked.

This is far from ideal, but does ensure that most user software never sees a
clock rollback. When (if) a network interface is finally brought up, the
clock can be corrected if ntp is installed: this will always produce a
roll-forward, never a roll-back.

Could we tweak /etc/init/hwclock.conf to use this as a fallback?

It's debatable whether this should be enabled by default though: as for
e2fsck, on a "working" system, it may be better to flag up unexpected clock
rollbacks rather than silently patching it up.

Cheers
---Dave

Revision history for this message
Jamie Bennett (jamiebennett) wrote :

@davem

Of course writing at shutdown/reboot only works on a clean shutdown, not on a loss of power. Writing the clock value on _boot-up_ maybe wiser, yes your clock will be out by boot time+uptime but at least it will be a sane value for next time and as you mentioned, with network connectivity you can retrieve the correct time via ntp.

Revision history for this message
Dave Martin (dave-martin-arm) wrote : RE: [Bug 563618] Re: Ignoring a broken clock results in infinite reboots;not ignoring results in fsck failure; no solution to this problem

Sure, I in no way suggested that my fix was complete :)

A working clock, or synchronising via NTP before checking the rootfs, is the
only full workaround.

However, I would suggest that if broken_system_clock is set, then the target
filesystem should not even be treated as unclean by e2fsck, provided that
all other requirements for cleanness of the filesystem are met--- and should
not be modified at all in this case. This could solve the problem--- we've
already demonstrated that "fixing" the last mount time in the filesystem is
nonsensical if the clock is invalid anyway.

Revision history for this message
Oliver Grawert (ogra) wrote :

even with broken_system_clock the infinite reboot occurs on first boot ...
one thing we came up with during the discussion is to probably capture the last mount time from the filesystem in the initramfs and then set the clock to this value before mountall runs we will probably have a session about this at UDS to discuss the several workarounds we see, teh bug in itself is unsolvable unless we start using non journalled filesystems

Revision history for this message
Dave Martin (dave-martin-arm) wrote :

A UDS discussion on this sounds like a good idea.

Revision history for this message
Oliver Grawert (ogra) wrote :

here is a proof of concept code snippet (untested yet) we could put in initramfs to work around the issue ...

#!/bin/sh
#
# use the fixrtc cmdline option in your bootloader to
# automatically set the hardware clock to the date of
# the last mount of your root filesystem to avoid fsck
# to get confused by the superblock being in the future

BROKEN_CLOCK=""

for x in $(cat /proc/cmdline); do
 case ${x} in
 root=*)
  UUID=${x#*=}
  UUID="${UUID#*=}"
 ;;
 fixrtc)
  BROKEN_CLOCK=1
 ;;
 esac
done

if [ -n $BROKEN_CLOCK ];then
 ROOTDISK=$(readlink -f /dev/disk/by-uuid/$UUID)

 TIMESTR=$(dumpe2fs -h $ROOTDISK 2>/dev/null|grep "Last mount time")
 TIME=${TIMESTR#*:}

 DISKYEAR=$(date -d "${TIME}" +%Y)
 CLOCKYEAR=$(date +%Y)

 if [ $DISKYEAR -gt $CLOCKYEAR ]; then
  hwclock --set --date="${TIME}"
 fi
fi

Revision history for this message
Oliver Grawert (ogra) wrote :

ok, here a tested and properly working set of two script snippets of the initramfs suggestion above:

ogra@ubuntu:~$ cat /usr/share/initramfs-tools/hooks/fixclock
#!/bin/sh -e
# initramfs hook for fixclock

MINKVER="2.6.24"
PREREQ=""

# Output pre-requisites
prereqs()
{
        echo "$PREREQ"
}

case "$1" in
    prereqs)
        prereqs
        exit 0
        ;;
esac

. /usr/share/initramfs-tools/hook-functions

# We use date, hwclock and dumpe2fs
copy_exec /bin/date /bin
copy_exec /sbin/hwclock /sbin
copy_exec /sbin/dumpe2fs /sbin

ogra@ubuntu:~$ cat /usr/share/initramfs-tools/scripts/init-premount/fixclock
#!/bin/sh -e
# initramfs init-top script for udev

PREREQ="udev"

# Output pre-requisites
prereqs()
{
        echo "$PREREQ"
}

case "$1" in
    prereqs)
        prereqs
        exit 0
        ;;
esac

# use the fixrtc cmdline option in your bootloader to
# automatically set the hardware clock to the date of
# the last mount of your root filesystem to avoid fsck
# to get confused by the superblock being in the future

BROKEN_CLOCK=""

for x in $(cat /proc/cmdline); do
        case ${x} in
        root=*)
                UUID=${x#*=}
                UUID="${UUID#*=}"
        ;;
        fixrtc)
                BROKEN_CLOCK=1
        ;;
        esac
done

if [ -n $BROKEN_CLOCK ];then
        ROOTDISK=$(readlink -f /dev/disk/by-uuid/$UUID)

        TIMESTR=$(dumpe2fs -h $ROOTDISK 2>/dev/null|grep "Last mount time")
        TIME=${TIMESTR#*:}

        DISKYEAR=$(date -d "${TIME}" +%Y)
        CLOCKYEAR=$(date +%Y)

        if [ $DISKYEAR -gt $CLOCKYEAR ]; then
                hwclock --set --date="${TIME}"
                hwclock --hctosys
        fi
fi

Revision history for this message
Oliver Grawert (ogra) wrote :

hmm... that code still has some issues, i will try to work them out over the weekend

Revision history for this message
Oliver Grawert (ogra) wrote :

so here the fixed code, seems the above did hit a race where date was still finishing while fsck already started, so date needed to have 1 min added to the last mount time ... also the check for the year limits to only certain usecases so i dropped it. the code below works reliable now (together with /usr/share/initramfs-tools/hooks/fixclock from above and indeed fixrtc needs to be set permanently on the cmdline) :

ogra@ubuntu:~$ cat /usr/share/initramfs-tools/scripts/local-premount/fixrtc
#!/bin/sh -e
# initramfs init-top script for fixrtc

PREREQ=""

# Output pre-requisites
prereqs()
{
        echo "$PREREQ"
}

case "$1" in
    prereqs)
        prereqs
        exit 0
        ;;
esac

# use the fixrtc cmdline option in your bootloader to
# automatically set the hardware clock to the date of
# the last mount of your root filesystem to avoid fsck
# to get confused by the superblock being in the future

BROKEN_CLOCK=""

for x in $(cat /proc/cmdline); do
        case ${x} in
        root=*)
                UUID=${x#*=}
                UUID="${UUID#*=}"
        ;;
        fixrtc)
                BROKEN_CLOCK=1
        ;;
        esac
done

if [ "$BROKEN_CLOCK" ];then
        ROOTDISK=$(readlink -f /dev/disk/by-uuid/$UUID)

        TIMESTR=$(dumpe2fs -h $ROOTDISK 2>/dev/null|grep "Last mount time")
        TIME=${TIMESTR#*:}

        date --set="${TIME} 1 minute"
        hwclock --systohc
fi

Revision history for this message
Dave Martin (dave-martin-arm) wrote :

How about:

for x in $(cat /proc/cmdline); do
 case ${x} in
 root=*)
  case ${x} in
   UUID=*)
ROOTDEVPATH=$ROOTDEVPATH/disk/by-uuid/${x#UUID=} ;;
   *) ROOTDEVPATH=$ROOTDEVPATH/$x ;;
  esac
  ;;

...

done

ROOTDEVPATH=$(readlink -f "/dev/$ROOTDEVPATH")

[...]

This supports the root=<device node> case. Hackers will thank you (I know I
will)
Extending it to support LABEL= would be easy too.

(Extra case statement used to avoid sed (slow) and ${//} (bash-specific))

Revision history for this message
Oliver Grawert (ogra) wrote :

adding flash-kernel here since flash-kernel-installer needs to have teh fixrtc option in the default cmdline

Changed in flash-kernel (Ubuntu Lucid):
milestone: none → ubuntu-10.04
importance: Undecided → High
status: New → Triaged
Revision history for this message
Oliver Grawert (ogra) wrote :

dave, did you test your fix and could you provide a branch with tested code ?

Revision history for this message
Dave Martin (dave-martin-arm) wrote :

Extended with the above suggestions, see: lp:~dave-martin-arm/initramfs-tools/initramfs-tools-fixrtc

Tested on babbage3 (booting with rdinit=/bin/sh and hacking the clock before exec'ing /init)

Revision history for this message
Steve Langasek (vorlon) wrote :

Have reviewed the change; should not impact boot performance or reliability on systems not booting with 'fixrtc', installs to the correct point in the boot sequence. Still an ugly hack, but we already knew that. :) The only change I'm making is to drop the superfluous call to hwclock --systohc -- we simply don't care about the value in the hardware clock because we're not going to be reading from it at all after this point, and writing this out delays the boot unnecessarily. /etc/init/hwclock-save.conf will write it out for us on shutdown, for all the good that does on the affected hardware.

Merged and uploading.

Revision history for this message
Oliver Grawert (ogra) wrote :

uploading flash-kernel too for the change in the default cmdline

Martin Pitt (pitti)
Changed in mountall (Ubuntu Lucid):
milestone: ubuntu-10.04 → none
affects: util-linux (Ubuntu Lucid) → initramfs-tools (Ubuntu Lucid)
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package initramfs-tools - 0.92bubuntu77

---------------
initramfs-tools (0.92bubuntu77) lucid; urgency=low

  * Add a new 'fixrtc' script that tries to set the system clock forward
    based on the last mount time of the root disk; without this kludge,
    systems without a working RTC will end up in a perpetual reboot loop.
    Thanks to Dave Martin <email address hidden> for the patch. LP: #563618.
 -- Steve Langasek <email address hidden> Fri, 23 Apr 2010 14:55:24 -0700

Changed in initramfs-tools (Ubuntu Lucid):
status: Triaged → Fix Released
Revision history for this message
Martin Pitt (pitti) wrote :

Closing manually due to not mentioning the bug number.

 flash-kernel (2.13ubuntu17) lucid; urgency=low
 .
   * add "fixrtc" option to default cmdline for OMAP3 Beagle, this fixes the
     fsck issues on this board which has no rtc battery by setting the clock to
     the last mount time of the disk during boot.

Changed in flash-kernel (Ubuntu Lucid):
assignee: nobody → Oliver Grawert (ogra)
status: Triaged → Fix Released
Revision history for this message
Oliver Grawert (ogra) wrote :

adding an upstream task for e2fsprogs since that is carrying the actual bug and also a task for ubuntus e2fsprogs (though it would be nice if i hadnt to add that to lucid sice it is unlikely that we will fix it there)

Revision history for this message
Phillip Susi (psusi) wrote :

Could these clock related fsck issues not be worked around by setting the broken clock to the creation time of the root fs when booting so that the current time is at least >= fs creation time?

Revision history for this message
Oliver Grawert (ogra) wrote :

a) its not setting it to creation but to last mount time it circumvent the clock being reset on every boot
b) how would i know the creation time
c) its a messy workaround for an upstream bug ted will hopefully fix upstream in fsck, we can only set the clock to rubbish anyway, does it matter in any way *which* rubbish it is
d) i really prefer that stamp we set it to moves with the mount time here, instead of having a hard stamp you set it back to on every boot. else your file timestamps might get messy far in teh future if your system is temporary on the network (which automatically syncs the hwclock)

Revision history for this message
Rolf Leggewie (r0lf) wrote :

lucid has seen the end of its life and is no longer receiving any updates. Marking the lucid task for this ticket as "Won't Fix".

Changed in e2fsprogs (Ubuntu Lucid):
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.