partitioner hangs if libparted sees partitions for which device nodes don't exist

Bug #770041 reported by Dave Gilbert
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
partman-auto (Ubuntu)
Fix Released
High
Colin Watson
Natty
Fix Released
High
Colin Watson

Bug Description

Binary package hint: debian-installer

Daily natty server i386 iso as of today 24th April

md5sum: 6309f74b92e99ce63eb8f91ae2e6bb28 /media/more/isos/natty-server-i386.iso

Installing into a KVM guest that was previously running Maverick Server; the guest had LVM partitions.
The last thing I saw was it asked me to confirm that it had got the time zone right, which it had, and now I'm left with a purple screen with a bar at the bottom.

The last thing in syslog (from ctrl-alt-f2) is

partman: Found volume group "server1" using metadata type lvm2
partman-lvm: 4 logical volume(s) in volume group "server1" now active

ps shows the last interesting entries as:
/bin/sh /lib/partman/display.d/10initial_auto
/bin/sh /lib/partman/automatically_partition/25repla

I've attached a screenshot showing the last few lines of /var/log/partman

The last of those shows it's stuck in pipe_wait.

/proc/partitions shows that it has started device mapper and it can see the partititions there, and /dev/mapper is correctly populated.

It seems repeatable, and from expert mode it also occurs when starting the partitioner.

It sounds similar to bug 729394 (but that was marked fix released about 4 weeks ago), but it might be closer to bug 757487

Dave

Revision history for this message
Dave Gilbert (ubuntu-treblig) wrote :
Revision history for this message
Colin Watson (cjwatson) wrote :

This is certainly a likely location for problems, but I'm afraid I can't reproduce this myself, which is probably necessary in order for me to fix it. Could you try this:

 * Start an installation attempt as before.
 * When you reach the hostname prompt, switch to Alt-F2, enter 'nano /lib/partman/automatically_partition/25replace/choices', change 'set -e' to 'set -ex', and save and exit.
 * Switch back to Alt-F1. Continue until it hangs.
 * Switch to Alt-F2. Kill all the process IDs listed in 'ps | egrep "partman|parted"', preferably in a single 'kill' command. (The idea of this is to force the full trace to be written out to the log file.)
 * Attach the full syslog and partman logs to this bug. There are two ways to extract them: either select "Save debug logs" from the installer main menu, or else run 'anna-install openssh-client-udeb' from a shell and then use scp to copy them from /var/log/ to another system.

Changed in debian-installer (Ubuntu):
status: New → Incomplete
Revision history for this message
Dave Gilbert (ubuntu-treblig) wrote :
Revision history for this message
Dave Gilbert (ubuntu-treblig) wrote :
Revision history for this message
Dave Gilbert (ubuntu-treblig) wrote :

Hi Colin,
  Logs attached as requested.
(I might be able to do another few tries in the next few hours today, but otherwise the next chance will probably be Friday).

Dave

Changed in debian-installer (Ubuntu):
status: Incomplete → New
Revision history for this message
Dave Gilbert (ubuntu-treblig) wrote :

jibel asked for hte output of pvdisplay/vgdisplay/lvdisplay - this is attached here, taken from alt-f2 from the installer CD.

The setup is relatively simple, virtio hard disk with a partition vda5 with the only physical volume, and in that one volume group (server1), and in that we have a root, swap_1 logical volumes for the server itself; note then we have an 'export' logical volume together with a snapshot of it.
(The 'export' used to be iSCSI exported and itself has a whole Ubuntu installation on).

Dave

Revision history for this message
Dave Gilbert (ubuntu-treblig) wrote :

For your fun and excitement, here is some strace op:

for the process shown as /bin/sh /lib/partman/automatically_partition/25repla

strce -ivp
[00914416] open("/var/lib/partman/infifo", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666

The two 10initial-auto processes are stuck in a read on fd 7 (a pipe)

The parted-server process is stuck in

[006ee416] open("/var/lib/partman/stopfifo", O_RDONLY

Dave

Revision history for this message
Brian Murray (brian-murray) wrote :

I took a regular LVM install of Maverick ubuntu-server and was able to install Natty ubuntu-server over it without any problems.

Revision history for this message
Colin Watson (cjwatson) wrote :

Are you sure you did that 'set -ex' edit? I'm not seeing the output from it, and that's the bit I really need.

Revision history for this message
Colin Watson (cjwatson) wrote :

Thinking about it, it wouldn't hurt to have the same 'set -ex' edit in /lib/partman/automatically_partition/15reuse/choices too.

Revision history for this message
Colin Watson (cjwatson) wrote :

Incidentally, the problem is not as you suggested on #ubuntu-testing that nothing is listening to the output of that script - what's happened is that the communication channel with parted_server has somehow got out of sync. That much I guessed from your initial report. The problem is determining exactly where that happened ...

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

I can reproduce the hang of partman with a partition on server1-export.
partman hangs in 35dump but I don't know exactly where yet because the output is buffered and the log is truncated.

Changed in debian-installer (Ubuntu):
importance: Undecided → Medium
status: New → Confirmed
Colin Watson (cjwatson)
affects: debian-installer (Ubuntu) → partman-auto (Ubuntu)
Changed in partman-auto (Ubuntu):
assignee: nobody → Colin Watson (cjwatson)
importance: Medium → High
status: Confirmed → In Progress
Colin Watson (cjwatson)
summary: - natty server installation hang
+ partitioner hangs if libparted sees partitions for which device nodes
+ don't exist
Changed in partman-auto (Ubuntu Natty):
milestone: none → ubuntu-11.04
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package partman-auto - 93ubuntu16

---------------
partman-auto (93ubuntu16) natty; urgency=low

  * If an LV contains a partition table, then libparted will see the
    partitions but device nodes won't be created for them, which broke the
    reuse and replace automatically_partition scripts. Tolerate this
    situation by ignoring device nodes that don't exist, or for which
    blockdev breaks in other ways (LP: #770041).
 -- Colin Watson <email address hidden> Tue, 26 Apr 2011 11:08:33 +0100

Changed in partman-auto (Ubuntu Natty):
status: In Progress → Fix Released
Revision history for this message
Dave Gilbert (ubuntu-treblig) wrote :

Hi,
  Just to confirm that on the release iso this works fine on the same VM. Thank you for the quick fix!

Dave

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.