root raid installs have bad grub config

Bug #33649 reported by James Troup
52
This bug affects 2 people
Affects Status Importance Assigned to Milestone
debian-installer (Ubuntu)
Fix Released
High
Ubuntu Installer Team
grub (Ubuntu)
Fix Released
Undecided
Unassigned
grub-installer (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

I have a machine with two SATA drives and the following layout:

 o sda1 100M /dos FAT16
 o sda2 large RAID
 o sda3 swap

 o sdb1 large RAID
 o sdb2 swap

and the two large RAID partitions in a RAID 0 device. I installed breezy with this setup and it worked flawlessly. I just tried current dapper (20060303) and after the first reboot post-install, grub failed, saying it couldn't find the kernel. Looking at the config, it's pointing at (hd0,0) which is obviously wrong. Changing it to (hd0,1) allowed Linux to boot.

Revision history for this message
James Troup (elmo) wrote :

regression from breezy

Revision history for this message
Martin Bailey (martin-pcalpha) wrote :

I am confirming the same problem with Dapper Flight 5. My setup includes two SATA disks configured with a few RAID arrays. However, the first partitions of the disks are FAT16 partitions for Windows swap. Grub configures the boot as (0,0) (sda1) when it should be (0,1) (md/0).

Changed in debian-installer:
status: Unconfirmed → Confirmed
Revision history for this message
Cornald (debian-sgrewe) wrote :

I also have the same problem.
I did a dist-upgrade to my 5.10 which ran quite fine for 4 month.
Now I´m using Ubuntu-Kernel 2.6.15-19 (both 386 and k7) but my sata drives aren´t recognized.
/dev only knows:
/hda (regular ide disc)
/hdc (cd-writer)
/hdd (dvd-writer)

I have 2 sata drives runnig different raid-system (0 and 1).
/ is on sda3 and raid1 (I´m not sure if it is at all, because I had not much time and breezy ran fine ;-)).
Now I´m going to build a new kernel and add support for sata drives and raid.

Revision history for this message
Cornald (debian-sgrewe) wrote :

Edit:
My old kernel (ubuntu 2.6.12.-9-k7) is still working well.
After apt-get update I recieve a "regular" kernel-panic because sda3 still isn´t found.

Specs:
Board: Asus A7n8x del. Rev. 2 (BIOS 1008)
Dapper Flight6 (dist-upgrade on wed 04th of april)

Revision history for this message
Michael Heča (orgoj) wrote :

Same with clean install of Ubuntu DD i386 Fligh 6 on two SATA disk.

Partitions are:
md0 sd[ab]1 swap
md1 sd[ab]2 ext3 /
md2 sd[ab]5 raidfs /home

Grub is instaled with root (hd0,0) but should be (hd0,1).

Revision history for this message
Jeff V Stein (jvstein) wrote :

I had this problem after espresso finished. It seems that the new grub was not installed to the MBA like usual.

It was using the previous grub install and it wasn't finding the menu/kernels.

Revision history for this message
Jeff V Stein (jvstein) wrote :

Err. should be MBR. Too much school.

My equipment:

4 PATA drives on a 3ware 7210 RAID controller.

Partitions:
sda1 /boot
sda4 /home
sda5 /
sda6 swap

Revision history for this message
JD Evora (jdevora-saadian) wrote :

I tryed it with Ubuntu Server Beta and happened the same.

I installed a RAID 1 where the partiions are
 1 GB Swap
 10 GB /
 rest of the disk /var

 My root device should be (hd0,1) but grub was configured for use hd0,0 (the swap partition) and the re-boot after the installation failled.

Revision history for this message
Rui Mesquita (rpedro78) wrote :

The same has happened to me installing Ubuntu Dapper beta, in my case grub uses the menu and boots from another installation on a SATA disk when I installed to the second IDE disk. This does not seem to be exclusively a Ubuntu bug, I had also this or similar problem on another pc when installing Kubuntu 5.10, and also another Debian based distro where grub wouldn't boot at all.

current setup
2 IDE disks
1 SATA disk

2nd IDE disk partition
hdb1 swap
hdb2 ext3 (root)
hdb5 ext2 (home)
hdb6 data

SATA disk partition
sda1 swap
sda2 ext3 (root)
sda5 data
etc..

grub boots and uses the menu from /dev/sda2, even when I explicitly give 'root=/dev/hdb2' at the command line. The 'solution' for me has been to boot from the Dapper install CD, and choose 'boot from first harddisk', then it boots correctly... any clues why?

Revision history for this message
therunnyman (therunnyman) wrote :

Confirming (see bug #44261 for details).

Revision history for this message
patrash (the-razvan) wrote :

I have installed Kubuntu 6.06 LTS; same problem.
my partition table looks like this:
/dev/sda1 swap
/dev/sda1 linux raid
/dev/sdb1 linux raid

/dev/md0 format with reiser mount with root

grub tries to boot from (hd0,0) but thats swap;

i've tries to change that to be (hd0,1) but i didn't succes.
right now i repartitionated the sda1 as linux raid and start a new new fresh install. ( the 5th one)
right now i'm doing a fresh reinstall with

Revision history for this message
patrash (the-razvan) wrote :

and it booted!

Revision history for this message
Reuben Firmin (reubenf) wrote : SATA / grub bug

Same here.

Ironically, grub was working just fine on Dapper flight installs until I installed a new IDE hard disk; configuration is now:

hdc: 200GB IDE
hdd: 40GB IDE
sda: 160GB SATA

I then used qtparted to alter the partitions on sda, rebooted, and grub crapped out with error 22. Since I've been on the upgrade path since hoary, I figured a fresh install of dapper wouldn't hurt.

I tried the dedicated install cd first, attempting to put / on the SATA disk. [NB - usb keyboard not supported!] Bad move all round; the grub install failed, and although the fallback lilo install claimed to have work, grub again chimed in with error 22 on the next reboot (grub had previously been on the MBR of hdd; therefore that probably wasn't touched by the lilo config.)

Then I ran the [kubuntu] live cd; it booted up once, allowing me to mount various partitions and move data around. I shut down, and later rebooted again on the live cd. This time it froze while trying to start X.

I reverted to the alternate install cd, and this time tried to install / on hdd. Grub tried to install to hd0, but again failed; this time, thankfully, the lilo install worked, and I'm up and running.

However, this was altogether pretty painful.

Revision history for this message
Ramon Casha (rcasha) wrote :

I have had problems with a new installation of Ubuntu server 6.10 with the root partition on RAID1, and have identified two apparent errors:

1. In grub/menu.lst, the kernel parameter was root=/dev/hda1 instead of root=/dev/md0

2. In /etc/fstab, the UUID of the root partition is incorrect. Replaced the UUID with /dev/md0 and it worked.

Revision history for this message
Soren Hansen (soren) wrote :

Reproduced on Dapper and Edgy. Feisty works just fine. If the fix for this can be cherrypicked, it would be great to be able to include it in the upcoming point release.

Revision history for this message
Philip Goodfellow (mail-philipgoodfellow) wrote :

Still present in Gutsy

see: bug #152229

Revision history for this message
Max Waterman (davidmaxwaterman+launchpad) wrote :

I think I just saw this problem when I installed gutsy from the alternate cd.

I have an existing RAID5 array on /dev/hd[cegi] and /dev/[cbef] which is /dev/md2, and installed to /dev/sd[ab]1, with swap on /dev/sd[ab]5.

The installation went pretty much as expected (considering I had a previous FC6 RAID1 installation on there), but the reboot failed.

To make it work, I had to edit boot from hd(4,0) to hd(0,0).

That this works has me confused since the /boot/grub/device.map has /dev/sdb as hd(4,0), so hd(4,0) is the correct device and should work.

Revision history for this message
Alexander Dietrich (adietrich) wrote :

I ran into this (or something similar) with a fresh Hardy server install.
My setup is:

/dev/md0 = /dev/sda1 + /dev/sdb1, ext2 mounted at "/boot"
/dev/md1 = /dev/sda2 + /dev/sdb2, xfs mounted at "/"

The menu.lst created by the installer contained "root (hd0,1)"
and "kernel /boot/vmlinuz...", so it looks like the installer was
not aware of the separate boot partition.

Revision history for this message
tricky1 (tricky1) wrote :

The installer on alternate cd looks broken as soon as one intends to use _any_ kind of raid.

Same problem on server edition. Who ever would want to install a real world server without mirroring?

I wonder why the responsible manager seems not to recognize importance of this bug??

Revision history for this message
tricky1 (tricky1) wrote :

Please wake up ;-)

Changed in debian-installer:
assignee: nobody → ubuntu-installer
Revision history for this message
tricky1 (tricky1) wrote :

It is very disappointing to see that all alfas of intrepid still have that problem. I wonder how this deeply rooted bug shall be corrected in the few weeks remaining if currently nothing seems to be done in that direction?

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

I have linked this bug to the following Blueprint for Intrepid:
 * https://blueprints.launchpad.net/ubuntu/+spec/boot-degraded-raid

And covered it briefly in the specification:
 * https://wiki.ubuntu.com/BootDegradedRaid#head-bad7ecb75ab9cd9fd207658b7aecf983f919e33a

I am working on this for Intrepid.

:-Dustin

Revision history for this message
tricky1 (tricky1) wrote :

Thx, good news from Dustin. Questions:

- When will your work be integrated into the daily iso?
- Did you read my additional comments added at the bottom of the wiki page?

I am convinced that a separate entry in grubs menu list is a short sighted approach, please comment!

Revision history for this message
Dustin Kirkland  (kirkland) wrote : Re: [Bug 33649] Re: root raid installs have bad grub config

On Wed, Jul 30, 2008 at 9:10 AM, tricky1 <email address hidden> wrote:
> - When will your work be integrated into the daily iso?

Ideally, sometime before FeatureFreeze, currently set for 28 Aug 2008.
 * https://wiki.ubuntu.com/IntrepidReleaseSchedule

> - Did you read my additional comments added at the bottom of the wiki page?

I did.

> I am convinced that a separate entry in grubs menu list is a short
> sighted approach, please comment!

Responses to your wiki page post...

I am not suggesting separate entries in a grub menu---I do not
understand where you got that idea. I'm happy to review your patches,
if you have code for an alternative approach.

The design is not yet final. The design I set forth emulates what I
did for yaboot, the PowerPC bootloader, several years ago. There have
been no complaints of the yaboot functionality, which is in use by a
number of commercial, enterprise PPC users.

To your points....

a) SMART output is practically worthless, IMHO. I have used it
extensively, and it has completely failed to report a negative status
of known, bad drives, and it has marked drives as 'bad' that went on
to operate flawlessly for years. If users want to use smartmontools,
they can do that in userspace, but I do not recommend integrating such
an imprecise technology into the init/boot processes.

b) We already have a timer, set to 30 seconds, which waits for the
device containing the root filesystem to show up. Perhaps you want to
see a graphic representation of that timer. If so, that is a
reasonable wishlist item. Feel free to open a new bug against
initramfs-tools for that one. Patches welcome.

c) This is how initramfs-tools works already.

d) This is how my patches to Bug #120375 work, with the exception that
they they support only a single degraded RAID device for $ROOT. You
can open a new bug, with complete instructions on how to reproduce, a
more complex scenario involving multiple degraded RAID devices
required for booting.

:-Dustin

Revision history for this message
tricky1 (tricky1) wrote :

@Dustin:

a) Even if smart status is not perfect it is a good idea to warn user during boot in case of errors detected.

b) What I tried to suggest is not a visible timer (good idea but not urgent), but the possibility that during _installation_ one can specify if automatic degraded booting will ever occur. (default timer=0 -> no automatism)

and if automatism is disabled or the user intervening during the timers run, she can make a well informed decison what arrays shall be run in degraded mode.

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

On Thu, Jul 31, 2008 at 7:58 AM, tricky1 <email address hidden> wrote:
> a) Even if smart status is not perfect it is a good idea to warn user
> during boot in case of errors detected.

I still don't see this as practical, as it would necessitate moving
smartctl (and dependent .so's) into the initramfs. But if you have a
working patch, I'll gladly look at it.

> b) What I tried to suggest is not a visible timer (good idea but not
> urgent), but the possibility that during _installation_ one can specify
> if automatic degraded booting will ever occur. (default timer=0 -> no
> automatism)

This is handled by the patches for Bug #120375. With that code, you
are able to provide on the kernel boot parameters one of:
 * bootdegraded=true
 * bootdegraded=false

The default is 'false'. The timer is fixed at 30 seconds, which will
be spent in 0.1 second intervals, looking for new devices that might
have shown up, or a clean RAID device.

:-Dustin

Revision history for this message
tricky1 (tricky1) wrote :

@Dustin:
Do I err with my impression that your intension is to get some quick fix while my concern is to make Ubuntu as simple and reliable as possible? It is now the moment to make sure that the user interface while booting mirrored raids is understandable and maneagable not only for experts but also for ordinary people like me ;-)

a) I am not fluent enough in bashing, so I can give only some hints easily realiziable for willing experts:
(btw smartctl adds 176K to 8M of initrd and imho is worth the effort).

Do for all devices used in array:

smartctl --smart=on --offlineauto=on --saveauto=on -H /dev/sda
if [ $? != 0 ] ; then
  echo "smartctl error $? in /dev/sda"
...

b) Did I misunderstand when assuming that kernel boot paramters must be set beforhand (when grub is running) or how will the user interface look.

Sorry but I'm missing better english knowledge to make you understand how I imagine the user interface in case of problems :-(

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

On Fri, Aug 1, 2008 at 9:25 AM, tricky1 <email address hidden> wrote:
> @Dustin:
> Do I err with my impression that your intension is to get some quick fix

My intention is to fix this multi-faceted bug in an incremental and
solvable manner.

That the bug has been around for 2.5 years is a testament to the fact
that it is difficult to solve. Breaking it down into manageable bits
is a tried and true software development process.

> a) I am not fluent enough in bashing, so I can give only some hints easily
> realiziable for willing experts:

I understand. My asking for a patch is a polite way of saying that
while you or others might be willing/interested in implementing the
functionality, I do not think it to be a productive use of my time.
However, I'm not entirely closed off to the idea (should someone else
do a good job with it), and I'll gladly review the code and offer
constructive criticism.

> (btw smartctl adds 176K to 8M of initrd and imho is worth the effort).
>
> Do for all devices used in array:

Firstly, this is off-topic for the current bug against grub. If you
really desire SMART disk scanning in the initramfs, please open a new
wishlist bug against initramfs-tools and capture these requests there.

smartctl cannot necessarily be used on all devices in an array. It
does not work on all drives. It will not work on USB connected media,
flash disks, or other increasingly popular alternatives to traditional
IDE/SCSI disks. And in my experience, it has yielded both false
positives, and false negatives.

I think that SMART is more trouble than it's worth *in the initramfs*.
 Please use SMART until your heart is content in runtime. Run it in a
cronjob, using smartctl+mdadm and email yourself the results, if you
want.

> b) Did I misunderstand when assuming that kernel boot paramters must be
> set beforhand (when grub is running) or how will the user interface
> look.

Kernel boot parameters can be statically set in the grub menu.lst, or
dynamically in the grub boot menu. The boot process needs to be as
automatic as possible. Asking questions detracts from the ability to
perform unattended boots.

:-Dustin

Revision history for this message
tricky1 (tricky1) wrote :

@Dustin

> My intention is to fix this multi-faceted bug in an incremental and
> solvable manner.
My suggestions ARE along these lines :D

> That the bug has been around for 2.5 years is a testament to the fact
> that it is difficult to solve. Breaking it down into manageable bits
> is a tried and true software development process.
No, not only me had found workarounds for booting in urgent cases ...
That in Ubuntu it is still present has other roots...
Did anybody object to your tried and true development process?
You as a bright expert might deny like some of your very bright collegues
that sometimes non experts can give hints for better ui design.

>> a) I am not fluent enough in bashing, so I can give only some hints easily
>> realiziable for willing experts:

> I understand. My asking for a patch is a polite way of saying that
> while you or others might be willing/interested in implementing the
> functionality, I do not think it to be a productive use of my time.

Well, after having spent some time to find out how the patch you requested
could be realized, II won't publicly tell you what I think of your so called politeness.

> However, I'm not entirely closed off to the idea (should someone else
> do a good job with it), and I'll gladly review the code and offer
> constructive criticism.

After all, I won't give any further comments on your attitude.

PS: And your explanation has shown that I was not too wrong when assuming
a few messages above that users will have to fiddle around with grub... :(

This was my very last comment on the subject. Byebye ...

tricky

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

I'm attaching a patch to the grub-installer package which adds support for iteratively installing GRUB to each disk in an mdadm-managed software RAID providing /boot.

I'm not done testing it yet, but I'm posting it here in case anyone has any productive comments on the patch.

:-Dustin

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Also, added to a bzr branch:
 * https://code.launchpad.net/~kirkland/grub-installer/33649

:-Dustin

Revision history for this message
Colin Watson (cjwatson) wrote :

I don't think this is a viable long-term solution, because grub-install after installation won't do the same thing. Thus the task on grub should definitely stay open. However, it may help in the short term.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package grub-installer - 1.32ubuntu2

---------------
grub-installer (1.32ubuntu2) intrepid; urgency=low

  [ Colin Watson ]
  * Add another guard against calling 'udevadm info' with an empty device
    name (LP: #30772).

  [ Dustin Kirkland ]
  * grub-installer: add support for writing an MBR on each disk in an
    mdadm-managed RAID providing /boot (LP: #33649)

 -- Colin Watson <email address hidden> Thu, 07 Aug 2008 19:02:15 +0100

Changed in grub-installer:
status: New → Fix Released
Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Colin-

I have a proposed patch in a bzr branch of grub:
 * https://code.launchpad.net/~kirkland/grub/33649

The first patch handled grub-installer used in the debian-installer, and this one handles grub-install (in the grub package), used in the installed operating system thereafter.

The grub-install_better_raid.diff patch basically reverts the raid.diff patch, replacing the (somewhat convoluted) getraid_mdadm() function with a far simpler algorithm matching that which we use in grub-installer.

Additionally, this patch adds support to detect if /boot is on a RAID device, and if so, we will iterate over each hard disk in the RAID providing /boot, installing GRUB to the MBR. I tried to match the logic and syntax used in grub-installer as identically as possible.

If you have a /dev/md0 RAID, providing your root filesystem, consisting of /dev/sda1 and /dev/sdb1, using this patch, you can run any of:
 # grub-install /dev/md0
 # grub-install /dev/sda
 # grub-install /dev/sdb
 # grub-install /dev/sda1
 # grub-install /dev/sdb1
And GRUB will be installed into the MBR of both /dev/sda and /dev/sdb.

The output of grub-install looks like this:

root@ubuntu:~# grub-install /dev/md0
Searching for GRUB installation directory ... found: /boot/grub
Installing GRUB to /dev/sda
Installing GRUB to /dev/sdb
Installation finished. No error reported.
This is the contents of the device map /boot/grub/device.map.
Check if this is correct or not. If any of the lines is incorrect,
fix it and re-run the script `grub-install'.

(hd0) /dev/sda

:-Dustin

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

    This is the second revision of patch 2/2 for bug #33649, based on feedback from
    slangasek. He questioned the previous revision, where I was installing GRUB
    onto all devices of a RAID providing /boot, every time.

    This revision will:
     1) install to all devices, if the target on the command line is /dev/md*
     2) install only to a specific RAID member, if the device on the command
        line is an actively sync'd device in a RAID providing /boot
     3) Otherwise, the current behavior of grub-install operates as normal

https://code.launchpad.net/~kirkland/grub/33649b

    :-Dustin

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Per feedback from slangasek on IRC, I have another revision of the grub patch.

This one does not assume/hard-code hd0, rather it uses the previously existing logic to find the appropriate bios device.

And if the user passes a device for which we cannot determine the appropriate bios device, we ask the use to either install directly to the RAID, or define it in device.map.

My branch is updated with revision 840 on lp:~kirkland/grub/33649b/

For the purposes of review, I'm attaching the contents of debian/patches/grub-install_better_raid.diff

:-Dustin

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Revision 841 in lp:~kirkland/grub/33649 adds some smarts (back) into the handling of the multiple devices on a RAID providing /boot.

Rather than using the same (hd0,0) type string for each device, independently calculate the partition number. This is problem is similarly reported in bug #46223, though for update-grub.

For the "hd0" bit, we take what we can find in device.map. Also, I'm killing the printing of device.map if installing onto a RAID, since it's somewhat misleading.

:-Dustin

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package grub - 0.97-29ubuntu35

---------------
grub (0.97-29ubuntu35) intrepid; urgency=low

  [ Dustin Kirkland ]
  * debian/patches/00list: add grub-install_better_raid.diff.
  * debian/patches/grub-install_better_raid.diff: add support for writing an
    MBR on each disk in an mdadm-managed RAID providing /boot (LP: #33649).

 -- Steve Langasek <email address hidden> Fri, 15 Aug 2008 11:53:12 -0700

Changed in grub:
status: New → Fix Released
Revision history for this message
tricky1 (tricky1) wrote :

Bug still present in Alpha 4

Revision history for this message
Colin Watson (cjwatson) wrote :

Please confirm whether this is still present in the beta, and if it is then exactly what the current symptoms are. Since there have been quite a lot of changes in this area, I don't expect that it's still behaving the same way, and something more than "bug still present" would be useful.

Revision history for this message
tricky1 (tricky1) wrote :

Iff Mr. Kirkland ever tried himself to install Intrepid on a pair of REAL harddisks with mirrored /boot on both disks and encrypted filesystem with LVM on another mirrored raid he would have found out that ALL the problems are still present in Beta!

Colin, it's not only me beeing tired of how the server team currently works. Please have a look a the logs of the weekly irc meeting: Not a single word anymore about all that mess regarding raid and crypt ...

Changed in debian-installer:
status: Confirmed → Fix Released
Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Please go back and read the original description of this bug.

The scenario you describe is completely different than the bug that James Troup reported in March of 2006, which has been fixed. I'm closing this bug as such.

This may well merit a new bug report, but it absolutely *does not belong* in this bug. You may open a new bug against mdadm, providing detailed, step-by-step instructions of exactly how to reproduce the problem.

:-Dustin

Revision history for this message
tricky1 (tricky1) wrote :

Who ever beleives that bugs go away by closing them on launchpad ....

I have a machine with two SATA drives and the following layout:

 o sda1 100M /dos FAT16
 o sda2 large RAID 1 for \ encrypted LVM
 o sda3 small RAID 2 for \boot

 o sdb1 100M /dos FAT16
 o sdb2 large RAID 1 for \ encrypted LVM
 o sdb3 small RAID 2 for \boot

Intrepid Beta does not allow to make such installation and aborts installation :-(

This IS the same bug for years!!!

And NO, Mr Kirkland, I will not open an new bug for this old stuff you might have fixed a long time ago.

Revision history for this message
Steve Langasek (vorlon) wrote :

If it aborts installation, then this is clearly not the same bug at all. The original bug report is that the system fails to boot after install, which is entirely different.

You need to open a separate bug report for the separate issue.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Related blueprints

Remote bug watches

Bug watches keep track of this bug in other bug trackers.