partman and grub2 fail in server amd64

Bug #567345 reported by Sergei Vorobyov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
busybox (Ubuntu)
Fix Released
High
Colin Watson
grub2 (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Binary package hint: grub2

ubuntu-server-amd64 daily 20100419.1 and preceding versions fail to install correctly

See https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/527401
starting from message #24 (see also #34, where forking is suggested)

Related branches

Revision history for this message
Sergei Vorobyov (svorobyov) wrote :

CentOS 5.4 installed on the same configuration (no bios tweaks) without any problems,
recognizing

160GB hdd as hda
3 x 2TB hdds as sda, sdb, sdc, resp.,

and correctly installing grub on hda's MBR

Suffices to borrow their installer and partman

Revision history for this message
Colin Watson (cjwatson) wrote :

Just to avoid wasting time: there is absolutely no possibility of us completely replacing our installer with the entirely different installer in CentOS.

Could you please attach the log files here? The point of opening a new bug was so that I would be able to think clearly about this one without having to wade through the old bug for logs :-)

Revision history for this message
Colin Watson (cjwatson) wrote :

I can reproduce the "exceeds the loop-partition-table-imposed maximum" bug; I'll work on fixing that. (If you want it tracked, open a bug specifically for it. Each separate problem should have a separate bug report - otherwise I'll just work on whatever I can reproduce, but you won't get specific notifications of fixes.)

Revision history for this message
Colin Watson (cjwatson) wrote :

The loop limit bug appears to be due to an incomplete patch. I'll reopen bug 543838 for this; subscribe to that bug for notifications of a fix.

Revision history for this message
Sergei Vorobyov (svorobyov) wrote :

Unfortunately:
1. Ubuntu 10.04 rescue option does not have scp and ssh (I wanted to
    scp them but could not), whereas CentOS 5.4, which has them, does not like ext4.
2. Ubuntu Live does not understand RAIDs
3. Thus the logs have gone with the CentOS installation
4. Moreover, I started badblocks -fsvw (red-write) tests just to be 100% sure of the disks
    quality
5. It runs pretty slowly

Once it's done (maybe tomorrow), I'll be able to repeat the installation of Lucid Server 64,
I have the partitions left.

Just to repeat (since you do not need logs to understand where the error is):

1. Partman assigned sda, sdb, sdc to my three 2TB SATA disks and sdd to a small 160GB IDE disk,
    despite the fact that
2. In the BIOS I set up the boot order: 1) IDE ,2) SATA 3) SATA 4) SATA
3. I made all the partitions on the small sdd
4. But grub flashed "installing on sda"

Obviously, it's a simple error in mapping the devices everyone makes now and then installing
grub by hand in presence of several disks

Revision history for this message
Colin Watson (cjwatson) wrote :

I've reproduced the failure to offer /dev/sdc1 as a spare. How odd.

Revision history for this message
Colin Watson (cjwatson) wrote :

The shell trace of the spare failure is thoroughly perplexing:

  + descriptions=/dev/vdb2 (500MB; ext4), /dev/vdb free #1 (1098911MB; FREE SPACE), /dev/vdc1 (98MB; ext4), /dev/vdc2 (500MB; ext4), /dev/vdc free #1 (1098911MB; FREE SPACE)
[...]
  + local description=/dev/vdb2 (500MB; ext4)
  + [ /dev/vdb2 (500MB; ext4), /dev/vdb free #1 (1098911MB; FREE SPACE), /dev/vdc1 (98MB; ext4), /dev/vdc2 (500MB; ext4), /dev/vdc free #1 (1098911MB; FREE SPACE) = /dev/vdb2 (500MB; ext4), /dev/vdb free #1 (1098911MB; FREE SPACE), /dev/vdc1 (98MB; ext4), /dev/vdc2 (500MB; ext4), /dev/vdc free #1 (1098911MB; FREE SPACE) ]
  + descriptions=

The penultimate trace line there corresponds to this source line:

  if [ "${descriptions#*, }" = "$descriptions" ]; then

So why isn't it stripping the first part (/dev/vdb2 and its parenthesis and trailing comma-space) from $descriptions? This is beginning to look like a shell bug, but I can't reproduce it from an interactive shell prompt. Perhaps we're looking at memory corruption here?

Revision history for this message
Colin Watson (cjwatson) wrote :

Sergei: No, you have described the problem incorrectly. partman does not assign device names at all. The kernel assigns device names, and for a variety of reasons it pays essentially no attention to the boot order configured in your BIOS. Any software that assumes that Linux device ordering corresponds to BIOS boot ordering is wrong and should be corrected.

Regarding scp: run 'anna-install openssh-client-udeb' and you'll get scp.

As of last month, grub-installer should ask whether GRUB should be installed to a default location or somewhere else. If you're not seeing this, I need logs to understand why. For best results, the logs should be generated from an installation attempt in which DEBCONF_DEBUG=developer was added as a boot parameter.

Revision history for this message
Colin Watson (cjwatson) wrote :

Congratulations, this wins the prize for the most obscure bug I've dealt with this release cycle. :-) I tracked it down to a stack handling bug in the busybox shell ...

Changed in busybox (Ubuntu):
assignee: nobody → Colin Watson (cjwatson)
importance: Undecided → High
status: New → In Progress
Colin Watson (cjwatson)
Changed in busybox (Ubuntu):
status: In Progress → Fix Committed
Revision history for this message
Colin Watson (cjwatson) wrote :

I cannot reproduce your GRUB problems. With my fixed parted and busybox packages applied to the running installer, I was able to do a test installation as follows:

  {/dev/vda1, /dev/vdb1, /dev/vdc1 (spare)} -> /dev/md0 (unused)
  {/dev/vda2, /dev/vdb2, /dev/vdc2 (spare)} -> /dev/md1 (ext4, /)

(The odd device names are because I was using virtio block devices in KVM. This shouldn't affect the outcome here; in fact if anything virtio usually exposes additional problems.)

GRUB was correctly installed to /dev/vda and /dev/vdb (though it might have been nice if it had been installed to /dev/vdc too; I'm not sure what the correct behaviour with spares is), and the resulting system booted with no assistance. Incidentally, the installer did indeed ask me to confirm whether I wanted to install GRUB to the MBR, and if I'd said no to that I could have entered one or more target devices manually.

Your output in https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/527401/comments/28 clearly indicates that GRUB Legacy was used, not GRUB 2. Why this should be the case, I don't know - we don't do this by default for RAID, and the syslog you posted in https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/527401/comments/37 shows no evidence of this happening. Perhaps it was some odd side-effect of the parted bug, not that I can think how.

Regarding your fdisk comment in https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/527401/comments/39, this is quite natural and expected because partman does not use partitioned RAID devices, but instead simply writes filesystems directly to /dev/md*. Thus, there's nothing for fdisk to read.

Could you please try a daily build once my two fixes so far have landed (which should be the case by Saturday's daily build) and see what's left? My instinct is that there was a fair amount of collateral damage from early errors and that this has created substantial confusion.

Revision history for this message
Sergei Vorobyov (svorobyov) wrote : Re: [Bug 567345] Re: partman and grub2 fail in server amd64

Thank you, hope it will work and I will test it asap. However,
I am traveling and will be away from my server until Wednesday

On Thu, Apr 22, 2010 at 5:25 PM, Colin Watson <email address hidden> wrote:
> I cannot reproduce your GRUB problems.  With my fixed parted and busybox
> packages applied to the running installer, I was able to do a test
> installation as follows:
>
>  {/dev/vda1, /dev/vdb1, /dev/vdc1 (spare)} -> /dev/md0 (unused)
>  {/dev/vda2, /dev/vdb2, /dev/vdc2 (spare)} -> /dev/md1 (ext4, /)
>
> (The odd device names are because I was using virtio block devices in
> KVM.  This shouldn't affect the outcome here; in fact if anything virtio
> usually exposes additional problems.)
>
> GRUB was correctly installed to /dev/vda and /dev/vdb (though it might
> have been nice if it had been installed to /dev/vdc too; I'm not sure
> what the correct behaviour with spares is), and the resulting system
> booted with no assistance.  Incidentally, the installer did indeed ask
> me to confirm whether I wanted to install GRUB to the MBR, and if I'd
> said no to that I could have entered one or more target devices
> manually.
>
> Your output in
> https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/527401/comments/28
> clearly indicates that GRUB Legacy was used, not GRUB 2.  Why this
> should be the case, I don't know - we don't do this by default for RAID,
> and the syslog you posted in
> https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/527401/comments/37
> shows no evidence of this happening.  Perhaps it was some odd side-
> effect of the parted bug, not that I can think how.
>
> Regarding your fdisk comment in
> https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/527401/comments/39,
> this is quite natural and expected because partman does not use
> partitioned RAID devices, but instead simply writes filesystems directly
> to /dev/md*.  Thus, there's nothing for fdisk to read.
>
> Could you please try a daily build once my two fixes so far have landed
> (which should be the case by Saturday's daily build) and see what's
> left?  My instinct is that there was a fair amount of collateral damage
> from early errors and that this has created substantial confusion.
>
> --
> partman and grub2 fail in server amd64
> https://bugs.launchpad.net/bugs/567345
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in “busybox” package in Ubuntu: Fix Committed
> Status in “grub2” package in Ubuntu: New
>
> Bug description:
> Binary package hint: grub2
>
> ubuntu-server-amd64 daily 20100419.1 and preceding versions fail to install correctly
>
> See https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/527401
> starting from message #24 (see also #34, where forking is suggested)
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/ubuntu/+source/busybox/+bug/567345/+subscribe
>

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package busybox - 1:1.13.3-1ubuntu11

---------------
busybox (1:1.13.3-1ubuntu11) lucid; urgency=low

  * rmescapes-stack.patch: Refresh stack pointers after makestrspace in
    _rmescapes (backported from dash, although I reinvented it independently
    too; LP: #567345).
 -- Colin Watson <email address hidden> Thu, 22 Apr 2010 14:15:15 +0100

Changed in busybox (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Marcus Tomlinson (marcustomlinson) wrote :

This release of Ubuntu is no longer receiving maintenance updates. If this is still an issue on a maintained version of Ubuntu please let us know.

Changed in grub2 (Ubuntu):
status: New → Incomplete
Revision history for this message
Marcus Tomlinson (marcustomlinson) wrote :

This issue has sat incomplete for more than 60 days now. I'm going to close it as invalid. Please feel free re-open if this is still an issue for you. Thank you.

Changed in grub2 (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.