Bug #495394 “autostart fails on boot time host when network devi...” : Bugs : libvirt package : Ubuntu

Revision history for this message

Heiko Harders (heiko-harders) wrote on 2009-12-11:

#1

Just rebooted the host, which started checking the file system. Thereafter all domains seemed to be up. Unsure whether this was coincidence (can't remember seeing all domains up after a reboot before), or whether the extra boot time somehow caused the domains to come up as expected.

Revision history for this message

Heiko Harders (heiko-harders) wrote on 2009-12-14:

#2

I've been able to start up all domains consistently on each boot of the host OS, by changing the parameters of the partition the host OS is installed on. I've forced a check of this filesystem on each system boot, and all domains are running consistently after the host is booted. The filesystem check only takes a couple of seconds, I still don't know whether it is just the extra delay during boot time gives libvirt the necessary time to get the domains up, or whether something else is going on.

Revision history for this message

Quetschke (tobias-quetschke) wrote on 2010-01-07:

#3

I am experiencing the same issue: Domains created and managed via libvirt/virsh do not autostart although they are marked as 'autostarting', the domain runs regulary on manual starts and the symlink in /etc/libvirt has been created successfully.

Revision history for this message

Holger Mauermann (mauermann) wrote on 2010-01-07:

#4

This may be related to upstart and bridge_utils (bug #498245). Try setting bridge_maxwait=0 in /etc/network/interfaces and see if this fixes the problem.

Revision history for this message

jeffbl (jeff-mulb) wrote on 2010-01-10:

#5

I have bridge_maxwait=0 set for both my bridges, and neither VM I have set to autostart does so. Also on 9.10 64 bit host. This used to work, then stopped, even for new virtual machines I create.

Revision history for this message

Heiko Harders (heiko-harders) wrote on 2010-01-11:

#6

I tried setting bridge_maxwait=0, I only booted two times thereafter to see what happened. In both occasions some of the VM's with autostart booted, but not all of them (first time 2/5, second time 4/5). So, at best this might have helped a bit, but it is not a solution for the problem.

The filesystem check, that only takes a couple of seconds, is still a good solution. When I setup my system so that it checks the filesystem (on which my host OS is installed, not the filesystem on which the VM's are installed), all VM's start consistently.

Revision history for this message

Heiko Harders (heiko-harders) wrote on 2010-01-14:

#7

Is there anything we can do to help somebody looking into this? I'm happy to provide more information if necessary. Should we look into other related packages that might cause the problem and file bug reports for those? For me this bug is pretty much a show stopper, autostarting domains is something I really need working.

Chuck Short (zulcss) on 2010-01-25

Changed in libvirt (Ubuntu):
importance:	Undecided → Medium
status:	New → Confirmed

Revision history for this message

Stuart Young (cef) wrote on 2010-02-17:

#8

msgs1 Edit (3.9 KiB, text/plain)

I too have come across this issue.

In my case, it appears to be related to br0 (my bridge) not becoming live in time.

As a temp solution, I found that simply adding a 'sleep 4' in the init script (just before it runs libvirtd) alleviates the issue. This gives the bridge enough time to become active. This is a total hack until some way to having it wait till the bridge is up appears and gets implemented.

Attached are two snippets of /var/log/messages:

1. msgs1 - Standard behaviour. First VM tries to spawn, hanging for 30 secs and then gets destroyed. Subsequent VM's start successfully as the bridge is now active.

2. msgs2 - Behaviour if I add 'sleep 4' just before running libvirtd in /etc/init.d/libvirt-bin

Revision history for this message

Stuart Young (cef) wrote on 2010-02-17:

#9

msgs2 Edit (4.6 KiB, text/plain)

Second file attached.

Revision history for this message

Stuart Young (cef) wrote on 2010-02-19:

#10

Upgraded the libvirt host to lucid (libvirt-bin 0.7.5-5ubuntu7). Since doing so, I have not been able to replicate the issue with guests not starting at boot due to the bridge coming up late (after some/all of the guests have started). Will continue to do more testing.

Notes:
1. eth0 appears to be coming up after the guests are running, but the guests seem to be fine with this and not crash out like they did under karmic.
2. Guests are still running karmic atm (if that's at all relevant?)

Revision history for this message

Stuart Young (cef) wrote on 2010-02-24:

#11

Upgraded one of the guests to lucid with no issues. All guests still start fine.

As this is a test machine, I can always create more VM's, if that seems like a good idea?

Suggestions for more tests to perform welcome.

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2010-04-02:

#12

I converted libvirt to an upstart init script in Lucid, and I expect that this should be fixed.

I believe that comment #11 confirms this, so I'm marking this fix-released.

Please reopen if you can reproduce this in Lucid.

Changed in libvirt (Ubuntu):
status:	Confirmed → Fix Released

Revision history for this message

Jan van Oorschot (janvanoorschot) wrote on 2010-07-02:

#13

I Have reproduced this bug in a fresh Lucid install. Libvirtd (running on Lucid), with one v-domain (also Lucid) on hardware with three virtual bridges (br0, br1 and br2, each connected to a physical ehternet interface eth0 eth1 eth2).

The temp. fix mentioned by Stuart in #8 fixes the problem for my, only given that i am running Lucid i had to edit /etc/init/libvirt-bin.conf:

pre-start script
        mkdir -p /var/run/libvirt
        # Clean up a pidfile that might be left around
        rm -f /var/run/libvirtd.pid
        echo "libvirtd sleep start" >> /tmp/libvirtd.txt
        sleep 40
        echo "libvirtd sleep end" >> /tmp/libvirtd.txt
end script

4 seconds was to fast, and 40 seconds seems to work for me (since this machines is going to boot once every year this is fine by me).

So this race-condition is still present in lucid (IMHO)

Regards, Jan

Revision history for this message

ossjunkie (ossjunkie) wrote on 2010-07-05:

#14

yes it is still present on lucid server.

after unsucessfully trying something like:

start on (runlevel [2345] and networking)

in the upstart script i also head for a dirty sleep in the pre-start script. Upstart experts needed!

Changed in libvirt (Ubuntu):
status:	Fix Released → Confirmed

Revision history for this message

Mika Båtsman (mika-batsman) wrote on 2010-08-10:

#15

I'm not an expert in upstart but solved the problem by creating an upstart task that checks whether all automatically started bridge interfaces are up and made libvirt-bin depend on it.

Patch attached.

Brian Murray (brian-murray) on 2010-08-10

tags:

added: patch

Revision history for this message

Andreas Ntaflos (daff) wrote on 2010-09-29:

#16

Mika, thank you for the patch and new upstart job. We are trying it out here and find that the bridged-network job seems to be waiting forever for a "net-device-up" signal to be emitted, thus keeping libvirtd-bin from starting.

I only now have begun reading up on upstart but so far I can't find any obvious flaws in your job definition. Waiting on the "net-device-up" signal which is emitted every time a network interface comes up (/etc/network/if-up.d/upstart) and then checking the status of the bridges seems the correct way but it doesn't work for us.

Did you take any additional steps in order to get this upstart job to work correctly?

Revision history for this message

Mika Båtsman (mika-batsman) wrote on 2010-09-30:

#17

Updated patch for 10.04 Edit (1.0 KiB, text/plain)

I made the original patch on karmic. For some reason it stopped working after upgrading to lucid.

I modified it a bit but forgot to post the changes. Here's an updated patch which I've been using successfully on couple of Lucid machines for over a month.

Hope it helps to solve the problem.

Revision history for this message

Andreas Ntaflos (daff) wrote on 2010-09-30:

#18

Thanks for the new patch! I tested it just now and it seems to work, but it's always hard to tell when dealing with race conditions. I'll keep testing.

Out of interest, the only change (apart from the more verbose way of testing the $interface variable) is the added "break" statement, right? I am not familiar enough with the Upstart boot process but why does this change make the job work correctly? I also have not found any info on what exactly ifquery does. Does it just read /etc/network/interfaces and extract the interface names?

Anyway, thanks again!

Revision history for this message

Seb James (sebjames) wrote on 2010-10-13:

#19

I made use of Mika's patch on a 10.04 system and found it solved my problem.

I had been confused and thought that there was something wrong with my/the apparmor config for libvirt, because there were a few apparmor audit messages in the log, but in fact, it was this issue with the bridge interfaces not coming up in time.

Revision history for this message

Roland Moriz (rmoriz) wrote on 2011-02-23:

#20

problem still exists in 10.10

Revision history for this message

Roland Moriz (rmoriz) wrote on 2011-02-23:

#21

Mika's patch seems not to work with alias interfaces like "eth0:1"

status bridge-network-interface INTERFACE='eth0:1' 2>/dev/null | awk '{print $3}'
=> ""

ifconfig eth0:1
eth0:1 Link encap:Ethernet HWaddr xxxxxxxxxx
          inet addr:xxxxxxxxx Bcast:xxxxxxx Mask:255.255.255.224
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          Interrupt:45 Base address:0xa000

Revision history for this message

Heiko Harders (heiko-harders) wrote on 2011-05-12:

#22

I can reproduce this on a fresh Ubuntu 11.04 using two 11.04 virtual machines (I had better luck with my previous 10.10 install that did work properly for me).

The patch provided by Mika does not seem to work for me, libvirt does not seem to be started properly with it (my domains are not shown in virsh with a `list --all' for example). My boot.log shows two lines with `Stopping Check if bridged network is up. OK' though.

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2011-05-12:

#23

I like the idea of Mika's patch. Besides extending it to not break on
eth0:1, should it (and can it) be extended to also support slow network
links which are not bridges?

I'm assigning this temporarily to SpamapS to get his opinion, and give
him a chance to nack the patch if he thinks an imminent new upstart
feature will fix this. If such a feature is not imminent, then I'll
push the fix for o-series and SRU it back.

Changed in libvirt (Ubuntu):
importance:	Medium → High
assignee:	nobody → Clint Byrum (clint-fewbar)

Revision history for this message

Mika Båtsman (mika-batsman) wrote on 2011-05-12:

#24

Patch V3 Edit (1.2 KiB, text/plain)

Part of this problem is that there are 2 competing mechanisms for bringing the bridges up:

1) bridge-network-interface.conf from bridge-utils
2) network-interface.conf from ifupdown

These two seem to be racing which one gets to handle the interface. It seems that most of the time network-interface.conf is faster and emits net-device-up. bridged-network.conf (from the patch) relies on net-device-up events. Because of the bridge-network-interface.conf the bridged-network.conf didn't seem to get all the net-device-up events which caused occasional failures. At least for me that happened very seldom and that made the problem more difficult to solve.

I've disabled bridged-network-interface.conf and haven't had any problems with networking and libvirt for a few months now.

The attachment has my current solution for 10.04. I haven't tested it with 10.10 or 11.04. It now checks status for status network-interface instead of bridge-network-interface so it's a bit more generic and could help with wider range of network configurations, not just with bridges. I have also rename bridged-network.conf to network-up.conf. If you have tested the previous patch and are going to test this one make sure you don't have both bridged-network.conf and network-up.conf.

I don't have any setups using alias interfaces so I don't know if this helps with with those but at least it's not that bridge specific any more.

Revision history for this message

Heiko Harders (heiko-harders) wrote on 2011-05-16:

#25

It seems there is another problem with my configuration that causes libvirt to have problems with autostarting virtual machines. I am using software RAID 1 with mdadm and my virtual machines are running on an LVM2 partition on md0. It seems the LVM2 volume is not yet available at the point where libvirt tries to start my virtual machines:

libvirtd: 20:42:16.345: 1389: error : qemuAutostartDomain:275 : Failed to autostart VM 'ns': unable to set user and group to '114:
125' on '/dev/mapper/storage-st0': No such file or directory
libvirtd: 20:42:16.346: 1389: error : virSecurityDACSetOwnership:125 : unable to set user and group to '114:125' on '/dev/mapper/s
torage-st1': No such file or directory

So it seems at this point it is not the bridge that is causing problems, but it is mdadm in combination with LVM2. According to my /var/log/boot.log the mdadm monitoring daemon is started after libvirt. But I'm not sure if the monitoring application has anything to do with it.

Revision history for this message

Heiko Harders (heiko-harders) wrote on 2011-05-24:

#26

I changed my upstart script to ensure both the bridge and the md0 device (on which the LVM volume is located) are started before libvirt is started. In my situation this makes sure all my virtual machines can be started. However, different virtual machines can have different dependencies on (possibly slow) hardware being available or not. Perhaps it is a good idea to create separate upstart scripts for each virtual machine? This way it could be ensured that the hardware a specific virtual machine is relying on is brought up.

I fixed my problems with the following `start on' line in /etc/init/libvirt-bin.conf:

start on runlevel [2345] and net-device-added INTERFACE="br0" and block-device-added DEVNAME="/dev/md0"

br0 is the bridge I am using
md0 is the raid volume on which the LVM2 volumes are located, it seems (although I'm not 100% sure) that the block-device-added event is always fired after all LVM volumes on the block device are up

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2011-05-24: Re: [Bug 495394] Re: autostart almost always fails on boot time host

#27

Excerpts from Heiko Harders's message of Tue May 24 18:01:58 UTC 2011:
> I changed my upstart script to ensure both the bridge and the md0 device
> (on which the LVM volume is located) are started before libvirt is
> started. In my situation this makes sure all my virtual machines can be
> started. However, different virtual machines can have different
> dependencies on (possibly slow) hardware being available or not. Perhaps
> it is a good idea to create separate upstart scripts for each virtual
> machine? This way it could be ensured that the hardware a specific
> virtual machine is relying on is brought up.

Another option is to make the start on more broad (start on
net-device-added or block-device-added) and then in the pre-start or
daemon's own code, check for the hardware and gracefully refuse to start
if its not available yet.

This can actually get racey though w/o instance specifiers though
because if block-device-added happens between the pre-start deciding
the block device it needed was not there, and the pre-start exitting,
upstart will just consider its job done (its already in a goal of start
so upstart won't change it).

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2011-05-24:

#28

Excerpts from Heiko Harders's message of Tue May 24 18:01:58 UTC 2011:
> I changed my upstart script to ensure both the bridge and the md0 device
> (on which the LVM volume is located) are started before libvirt is
> started. In my situation this makes sure all my virtual machines can be
> started. However, different virtual machines can have different
> dependencies on (possibly slow) hardware being available or not. Perhaps
> it is a good idea to create separate upstart scripts for each virtual
> machine? This way it could be ensured that the hardware a specific
> virtual machine is relying on is brought up.
>
> I fixed my problems with the following `start on' line in /etc/init
> /libvirt-bin.conf:
>
> start on runlevel [2345] and net-device-added INTERFACE="br0" and block-
> device-added DEVNAME="/dev/md0"
>
> br0 is the bridge I am using
> md0 is the raid volume on which the LVM2 volumes are located, it seems (although I'm not 100% sure) that the block-device-added event is always fired after all LVM volumes on the block device are up
>

Interesting finding, though I think its a fairly dangerous assumption. It
may happen that way simply because the block device added event isn't
emitted by udev until the kernel has scanned partitions, hence finding
the LVM and enabling it. Probably something that needs build-time and
maybe even an automated test created if it turns into a generic solution
of any kind.

Revision history for this message

m m (bk-praca) wrote on 2011-06-20: Re: autostart almost always fails on boot time host

#29

Could you please look at this thread:
http://<email address hidden>/msg01444.html ?

Do you have all IP tables related modules loaded when libvirt-bin is started? Do you know what can be done to ensure they are loaded before libvirt-bin is started?

Robbie Williamson (robbiew) on 2011-06-30

Changed in libvirt (Ubuntu):
assignee:	Clint Byrum (clint-fewbar) → Serge Hallyn (serge-hallyn)

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2011-06-30:

#30

@Mika,

thanks for posting your patches along the way. Regarding your third version (in comment #24), is it just the net-device-up for the bridge which you're not seeing? Have you tried leaving bridge-network-interface.conf enabled, and, at the bottom of the loop in its pre-start script, doing

ifup $i

(to force /etc/network/if-up.d/upstart to do the initctl emit for us). So the script would look like:

pre-start script
. /lib/bridge-utils/bridge-utils.sh

        mkdir -p /var/run/network
        for i in $(ifquery --list --allow auto); do
                ports=$(ifquery $i | sed -n -e's/^bridge_ports: //p')
                for port in $(bridge_parse_ports $ports); do
                        case $port in
                                $INTERFACE|$INTERFACE.*)
                                        ifup --allow auto $i
                                        brctl addif $i $port && ifconfig $port 0.0.0.0 up
                                        break
                                        ;;
                        esac
                done
                ifup $i
        done
end script

Clint, does that look reasonable to you?

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2011-07-07:

#31

In oneiric the bridge-network-interface.conf is in fact gone. I've got a setup that reproduce this (using an lxc container connected to a bridge, br3, which is brought up by an upstart job which first sleeps two minutes).

I'll get a version of the libvirt-networking-up.conf that works for me and post a debdiff.

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2011-07-07:

#32

Following jhunt's terrific suggestion, I changed the start on for
libvirt-bin to:

start on (runlevel [2345] and stopped networking RESULT=ok)

which is working perfectly.

Serge Hallyn (serge-hallyn) on 2011-07-07

Changed in libvirt (Ubuntu):
status:	Confirmed → In Progress

Revision history for this message

Launchpad Janitor (janitor) wrote on 2011-07-07:

#33

This bug was fixed in the package libvirt - 0.9.2-4ubuntu3

---------------
libvirt (0.9.2-4ubuntu3) oneiric; urgency=low

  * Fix /etc/init/libvirt-bin.conf start on to wait until networking.conf
    has stopped with success, meaning ifup -a completed successfully and
    all auto-started network devices are up. (LP: #495394)
-- Serge Hallyn <email address hidden> Thu, 07 Jul 2011 10:23:25 -0500

Changed in libvirt (Ubuntu):
status:	In Progress → Fix Released

Serge Hallyn (serge-hallyn) on 2011-07-07

description:

updated

Revision history for this message

Chris Halse Rogers (raof) wrote on 2011-07-11:

#34

Before accepting this upload I'd like to check that you don't want to fold the missing fixes for bug #697046 into this upload.

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2011-07-11:

#35

Thanks, Chris. At this point i prefer to finish up with this one and push 697046 separately next. If you prefer that I combine them, please let me know and I'll go happily do it.

Revision history for this message

Martin Pitt (pitti) wrote on 2011-07-12: Please test proposed package

#36

Hello Heiko, or anyone else affected,

Accepted libvirt into natty-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in libvirt (Ubuntu Natty):
status:	New → Fix Committed
tags:	added: verification-needed
Changed in libvirt (Ubuntu Maverick):
status:	New → Fix Committed

Revision history for this message

Martin Pitt (pitti) wrote on 2011-07-12:

#37

Hello Heiko, or anyone else affected,

Accepted libvirt into maverick-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in libvirt (Ubuntu Lucid):
status:	New → Fix Committed

Revision history for this message

Martin Pitt (pitti) wrote on 2011-07-12:

#38

Hello Heiko, or anyone else affected,

Accepted libvirt into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Revision history for this message

Franck78 (fbourdonnec) wrote on 2011-07-14: Re: autostart almost always fails on boot time host

#39

Hello,

Using ubuntu server 11.04

The fix does nothing "start on (runlevel [2345] and stopped networking RESULT=ok)"
Have patche applied
ii libvirt-bin 0.8.8-1ubuntu6.3
ii libvirt0 0.8.8-1ubuntu6.3

BUT I found on my installation that when

1/ the default virbr0 in /etc/libvirt/qemu/networks
is activated, VMs startup is done.

2/ the default virbr0 is disabled (by removing autostart symlink),
VMs never start.

The VMs don't use virbr0 but my defined br0 and br1.

Franck

my /etc/network/interfaces FYI
auto lo br0 br1
# The loopback network interface
iface lo inet loopback

# The real primary network interfaces
iface eth0 inet manual
iface eth1 inet manual

iface br0 inet static
     bridge_ports eth0
     bridge_stp off
    address 10.0.0.200
    netmask 255.255.0.0
    gateway 10.0.0.100

iface br1 inet static
     bridge_ports eth1
     bridge_stp off
     address 10.2.0.200
     netmask 255.255.0.0
     broadcast 10.2.255.255

Rolf Leggewie (r0lf) on 2011-07-16

tags:

added: verification-failed
removed: verification-needed

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2011-07-18:

#40

@Franck78,

Are you able to tell why the containers failed to start? Can you post the xml contents for the containers which fail to start (result of 'virsh dumpxml <containername>'), and do a

apport-collect 495394

?

Revision history for this message

Dave Walker (davewalker) wrote on 2011-07-18:

#41

bouncing back to verification-needed as it's not clear it has -failed. Thanks.

tags:

added: verification-needed
removed: verification-failed

Revision history for this message

Franck78 (fbourdonnec) wrote on 2011-07-18:

#42

Download full text (8.3 KiB)

@Serge,

here is one domain; other is a duplicate. It a test machine.
I think the apport-collect is unhappy...
see log after this dumpxml

Maybe the disk system is also mandatory to complete ?
I have two 2TB drives mirrored+lvm2 on a not so slow board
(phenom2 6hearts on gigabyte ga890fxa-ud7)

As it is a test server, you can ssh in it if you want. Tell.me.

<domain type='kvm'>
  <name>ipcop</name>
  <uuid>fe2d60ab-4dc8-677e-9876-6e848380dbf3</uuid>
  <description>Un IPcop de test</description>
  <memory>524288</memory>
  <currentMemory>524288</currentMemory>
  <vcpu>1</vcpu>
  <os>
    <type arch='x86_64' machine='pc-0.14'>hvm</type>
    <boot dev='hd'/>
    <bootmenu enable='no'/>
  </os>
  <features>
    <pae/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/bin/kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw'/>
      <source file='/home/fbourdonnec/vm/ipcop/ipcop.raw'/>
      <target dev='hda' bus='ide'/>
      <address type='drive' controller='0' bus='0' unit='0'/>
    </disk>
    <controller type='ide' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:d3:d8:1a'/>
      <source bridge='br0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <interface type='bridge'>
      <mac address='52:54:00:a3:c1:dd'/>
      <source bridge='br1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes' keymap='fr'/>
    <video>
      <model type='cirrus' vram='9216' heads='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </memballoon>
  </devices>
</domain>

<domain type='kvm'>
  <name>ipcop</name>
  <uuid>fe2d60ab-4dc8-677e-9876-6e848380dbf3</uuid>
  <description>Un IPcop de test
login root:test
green 10.0.0.50 admin:test
</description>
  <memory>524288</memory>
  <currentMemory>524288</currentMemory>
  <vcpu>1</vcpu>
  <os>
    <type arch='x86_64' machine='pc-0.14'>hvm</type>
    <boot dev='hd'/>
    <bootmenu enable='no'/>
  </os>
  <features>
    <pae/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/bin/kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw'/>
      <source file='/home/fbourdonnec/vm/ipcop/ipcop.raw'/>
      <target dev='hda' bus='ide'/>
      <address type='drive' controller='0' bus='0' unit='0'/>
    </disk>
    <controller type='ide' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' func...

@Serge,

here is one domain; other is a duplicate. It a test machine.
I think the apport-collect is unhappy...
see log after this dumpxml

Maybe the disk system is also mandatory to complete ?
I have two 2TB drives mirrored+lvm2 on a not so slow board
(phenom2 6hearts on gigabyte ga890fxa-ud7)

As it is a test server, you can ssh in it if you want. Tell.me.

<domain type='kvm'>
  <name>ipcop</name>
  <uuid>fe2d60ab-4dc8-677e-9876-6e848380dbf3</uuid>
  <description>Un IPcop de test</description>
  <memory>524288</memory>
  <currentMemory>524288</currentMemory>
  <vcpu>1</vcpu>
  <os>
    <type arch='x86_64' machine='pc-0.14'>hvm</type>
    <boot dev='hd'/>
    <bootmenu enable='no'/>
  </os>
  <features>
    <pae/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/bin/kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw'/>
      <source file='/home/fbourdonnec/vm/ipcop/ipcop.raw'/>
      <target dev='hda' bus='ide'/>
      <address type='drive' controller='0' bus='0' unit='0'/>
    </disk>
    <controller type='ide' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:d3:d8:1a'/>
      <source bridge='br0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <interface type='bridge'>
      <mac address='52:54:00:a3:c1:dd'/>
      <source bridge='br1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes' keymap='fr'/>
    <video>
      <model type='cirrus' vram='9216' heads='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </memballoon>
  </devices>
</domain>

<domain type='kvm'>
  <name>ipcop</name>
  <uuid>fe2d60ab-4dc8-677e-9876-6e848380dbf3</uuid>
  <description>Un IPcop de test
login root:test
green 10.0.0.50 admin:test
</description>
  <memory>524288</memory>
  <currentMemory>524288</currentMemory>
  <vcpu>1</vcpu>
  <os>
    <type arch='x86_64' machine='pc-0.14'>hvm</type>
    <boot dev='hd'/>
    <bootmenu enable='no'/>
  </os>
  <features>
    <pae/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/bin/kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw'/>
      <source file='/home/fbourdonnec/vm/ipcop/ipcop.raw'/>
      <target dev='hda' bus='ide'/>
      <address type='drive' controller='0' bus='0' unit='0'/>
    </disk>
    <controller type='ide' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:d3:d8:1a'/>
      <source bridge='br0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <interface type='bridge'>
      <mac address='52:54:00:a3:c1:dd'/>
      <source bridge='br1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes' keymap='fr'/>
    <video>
      <model type='cirrus' vram='9216' heads='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </memballoon>
  </devices>
</domain>

fbourdonnec@vmserver:~$ sudo apport-collect 495394

The authorization page:
 (https://launchpad.net/+authorize-token?oauth_token=x...s&allow_permission=DESKTOP_INTEGRATION)
should be opening in your browser. Use your browser to authorize
this program to access Launchpad on your behalf.

Waiting to hear from Launchpad about your decision...

*** You are not the reporter of this problem report. It is much easier to mark a bug as a duplicate of another than to move your comments and attachments to a new bug.

Subsequently, we recommend that you file a new bug report using "apport-bug" and make a comment in this bug about the one you file.

Do you really want to proceed?

What would you like to do? Your options are:
  Y: Yes
  N: No
  C: Cancel
Please choose (Y/N/C): y
Package libvirt not installed and no hook available, ignoring
Package libvirt not installed and no hook available, ignoring
Package libvirt not installed and no hook available, ignoring

*** Updating problem report

No additional information collected.

Press any key to continue...

No pending crash reports. Try --help for more information.

fbourdonnec@vmserver:~$ dpkg -l|grep libvirt
ii  libvirt-bin                     0.8.8-1ubuntu6.3           the programs for the libvirt library
ii  libvirt0                        0.8.8-1ubuntu6.3            library for interfacing with different virtualization systems
ii  python-libvirt                  0.8.8-1ubuntu6.2        libvirt Python bindings

fbourdonnec@vmserver:~$ virsh list
 Id Name                 State
----------------------------------

Disk subsystem

md0 : active raid1 sdb3[1] sda3[0]
      1953414749 blocks super 1.2 [2/2] [UU]
      
md1 : active raid1 sdb2[1] sda2[0]
      97644 blocks super 1.2 [2/2] [UU]

and on top of that the LVM manager

--- Volume group ---
  VG Name               system
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  8
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                4
  Open LV               4
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               1,82 TiB
  PE Size               4,00 MiB
  Total PE              476907
  Alloc PE / Size       31105 / 121,50 GiB
  Free  PE / Size       445802 / 1,70 TiB
  VG UUID               ClZqRf-Xnkr-LmnG-z7x4-YBfH-qx38-Tz8cay
  --- Logical volume ---
  LV Name                /dev/system/root
  VG Name                system
  LV UUID                W3pVH9-rmbp-g6ZI-Kavy-rPwJ-pQ70-yKFsvC
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                37,25 GiB
  Current LE             9536
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           251:0
   
  --- Logical volume ---
  LV Name                /dev/system/home
  VG Name                system
  LV UUID                PSyldS-QARJ-xY1X-tlCe-0Bxt-TZs6-GrIYhG
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                57,25 GiB
  Current LE             14656
  Segments               2
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           251:1
   
  --- Logical volume ---
  LV Name                /dev/system/swap
  VG Name                system
  LV UUID                U5LAoW-kwbh-APj3-7P29-LTkl-R54h-YJJD09
  LV Write Access        read/write
  LV Status              available
  # open                 2
  LV Size                8,38 GiB
  Current LE             2145
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           251:2
   
  --- Logical volume ---
  LV Name                /dev/system/var
  VG Name                system
  LV UUID                KBoXgg-bQCQ-Ba3O-KLBE-P3CJ-Gup9-g1duvZ
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                18,62 GiB
  Current LE             4768
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           251:3

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2011-07-18:

#43

@Franck78,

Thanks for the info.

Yes, slow storage is not addressed by this and also needs to be fixed. For that one, I want to discuss with upstream, as we'll probably want to use a generic tool to enumerate all the storage needed by autostart domains. Considering the many complicated possibilities, it seems a daunting task.

I suppose a simple, non-intelligent way to go about it would be to grab all 'source file=' and 'source dev=' lines inside a <disk > .. </disk> stanza, and, in a /etc/init/libvirt-storage-waiter.conf, which is

start on mounted and not started libvirt-bin

do something like

pre-start script
   if [ status libvirt-bin | grep start > /dev/null ]; then
      stop
      exit 0
   fi
   for f in `/sbin/enumerate-libvirt-autostart-files`; do
      if [ ! -r $f ]; then
         stop
         exit 0
      fi
   done
   initctl emit -n libvirt-storage-ready
end script

and have /etc/init/libvirt-bin.conf
start on (runlevel [2345] and stopped networking STATUS=ok and libvirt-storage-ready)

Revision history for this message

Franck78 (fbourdonnec) wrote on 2011-07-19:

#44

Hello,
The disk subsystem may or may not be involved on my system. Difficult to say and have no relation with the 'virbr0' (when defined virbr0=>everything OK at least with two VMs booting).
I have compiled 0.9.2 and 0.9.3 for my 11.04 system. I will try again with those updated libvirt-bin (incredible number of bug fixed at every release !).
Btw, 'upstart' is strange : "and stopped networking STATUS=ok" , for me this means when networking is properly shutdown... Need to find the upstart howto ;-)

Franck

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2011-07-19:

#45

debdiff Edit (5.5 KiB, text/plain)

For posterity, here is a debdiff doing sort of what I was thinking for slow storage

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2011-07-19:

#46

@Franck78

no doubt, counterintuitive :) it means that the upstart job has finished, though.

Thanks for the infi. If you find a new bug responsible, please feel free to open a new bug.

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2011-07-20: Re: [Bug 495394] Re: autostart almost always fails on boot time host

#47

Excerpts from Serge Hallyn's message of Tue Jul 19 19:47:40 UTC 2011:
> @Franck78
>
> no doubt, counterintuitive :) it means that the upstart job has
> finished, though.
>
> Thanks for the infi. If you find a new bug responsible, please feel
> free to open a new bug.
>

That is indeed quite counter-intuitive.. a result of naming a
task something that sounds more like a state. It should be named
'configure-static-network'

The 'stopped networking' bit should go away in oneiric. I'm polishing
off the last bits of a new event, 'static-network-up', which means
all interfaces marked 'auto' in /etc/network/interfaces are "up". This
should allow things that can't handle transient network interfaces to
at least try to start at the right time, which is, for the most part,
what 'start on stopped networking' is an attempt to do.

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2011-07-21: Re: autostart almost always fails on boot time host

#48

The lucid-proposed package just passed my test case. As I wrote the fix I don't know if I can verify.

Serge Hallyn (serge-hallyn) on 2011-07-21

description:

updated

Revision history for this message

Franck78 (fbourdonnec) wrote on 2011-07-23:

#49

Long debug log showing the problem Edit (73.3 KiB, text/plain)

Hello *,
@Serge,

I'm pretty sure now where is the problem.
Disk related.
The patch waiting for 'disk' is not sufficient.

I have updated the libvirt.0.8.8 with 0.9.3 compiled locally.
Nothing changes, one VM (over two in autostart) is starting.

Short libvirt log activated :

20:38:48.728: 1226: info : libvirt version: 0.9.3
20:38:48.728: 1226: error : qemuMonitorIORead:487 : Unable to read from monitor: Connection reset by peer
20:38:48.736: 1234: error : qemuMonitorTextGetPtyPaths:1960 : operation failed: failed to retrieve chardev info in qemu with 'info chardev'
20:38:49.751: 1234: error : qemuAutostartDomain:156 : Failed to autostart VM 'ipcop-clone': operation failed: failed to retrieve chardev info in qemu with 'info chardev'

Every VM is tried. Some fail ;-)

The complete debug shows what is wrong.

Libvirt try to open some socket/file on "/var/lib/libvirt/......

AND /var is not ready (not mounted, residing on lvm system)

See the full log in attachement

#L1 starts ipcop-clone
#L104 failure declared for this VM
#L876 starts ipcop
#1005 successfully open the monitor chanel, go on

Can you fix your patch for this ?
Need to wait also for some utility directories like /var/run /var/lib !

Franck

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2011-07-25:

#50

@Franck78

libvirt won't start until mounted-varrun has happened, so what you are
describing *should* not be happening. Can you open a new bug, preferably
using 'apport-bug libvirt-bin'? Please include your /etc/fstab and a
description of your storage setup (lvm config etc).

Revision history for this message

Dave Walker (davewalker) wrote on 2011-07-25:

#51

Verified that this bug is resolved on Lucid with the proposed package, with no obvious regressions (basic functionality works as expected)

Revision history for this message

Dave Walker (davewalker) wrote on 2011-07-25:

#52

Verified the proposed package for Maverick, as above. Thanks.

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2011-07-25:

#53

Still need verification on natty. Thanks!

tags:	added: verification-done removed: verification-needed
tags:	added: verification-done-lucid verification-done-maverick verification-needed removed: verification-done

Revision history for this message

Dave Walker (davewalker) wrote on 2011-07-25:

#54

Verification for Natty now complete. Thanks.

tags:

added: verification-done verification-done-natty
removed: verification-needed

Revision history for this message

Launchpad Janitor (janitor) wrote on 2011-07-25:

#55

This bug was fixed in the package libvirt - 0.7.5-5ubuntu27.14

---------------
libvirt (0.7.5-5ubuntu27.14) lucid-proposed; urgency=low

  * Fix /etc/init/libvirt-bin.conf start on to wait until networking.conf
    has stopped with success, meaning ifup -a completed successfully and
    all auto-started network devices are up. (LP: #495394)
-- Serge Hallyn <email address hidden> Thu, 07 Jul 2011 16:41:04 -0500

Changed in libvirt (Ubuntu Lucid):
status:	Fix Committed → Fix Released

Revision history for this message

Launchpad Janitor (janitor) wrote on 2011-07-25:

#56

This bug was fixed in the package libvirt - 0.8.3-1ubuntu19

---------------
libvirt (0.8.3-1ubuntu19) maverick-proposed; urgency=low

  * Fix /etc/init/libvirt-bin.conf start on to wait until networking.conf
    has stopped with success, meaning ifup -a completed successfully and
    all auto-started network devices are up. (LP: #495394)
-- Serge Hallyn <email address hidden> Thu, 07 Jul 2011 16:48:36 -0500

Changed in libvirt (Ubuntu Maverick):
status:	Fix Committed → Fix Released

Revision history for this message

Launchpad Janitor (janitor) wrote on 2011-07-25:

#57

This bug was fixed in the package libvirt - 0.8.8-1ubuntu6.3

---------------
libvirt (0.8.8-1ubuntu6.3) natty-proposed; urgency=low

  * Fix /etc/init/libvirt-bin.conf start on to wait until networking.conf
    has stopped with success, meaning ifup -a completed successfully and
    all auto-started network devices are up. (LP: #495394)
-- Serge Hallyn <email address hidden> Thu, 07 Jul 2011 16:54:35 -0500

Changed in libvirt (Ubuntu Natty):
status:	Fix Committed → Fix Released

Revision history for this message

Franck78 (fbourdonnec) wrote on 2011-07-25:

#58

well,
if it is not "/var/something"
there is also
"/dev/pts"

Something is wrong around the "qemuMonitor" routines.

find why qemuMonitorIORead:487 is triggered and the bug is closed !

I really don't understand the logic of 'upstart' to trace into it, sorry.
Openning a new bug on the same subject ? Why ?

Franck

****** when it is OK ********
20:57:10.438: 1234: debug : qemuMonitorTextCommandWithHandler:237 : Send command 'info chardev' for write with FD -1
....
20:57:10.439: 1234: debug : qemuMonitorTextCommandWithHandler:242 : Receive command reply ret=0 rxLength=109 rxBuffer='charmonitor: filename=unix:/var/lib/libvirt/qemu/ipcop.monitor,server
charserial0: filename=pty:/dev/pts/1'

******** when it is NOT ok **********
20:57:08.555: 1234: debug : qemuMonitorTextCommandWithHandler:237 : Send command 'info chardev' for write with FD -1
....
20:57:08.648: 1228: error : qemuMonitorIORead:487 : Unable to read from monitor: Connection reset by peer
20:57:08.648: 1228: debug : qemuMonitorIO:610 : Error on monitor Unable to read from monitor: Connection reset by peer
20:57:08.648: 1228: debug : qemuMonitorIO:644 : Triggering error callback
20:57:08.648: 1228: debug : qemuProcessHandleMonitorError:170 : Received error on 0x142f5a0 'ipcop-clone'
20:57:08.648: 1234: debug : qemuMonitorSend:811 : Send command resulted in error Unable to read from monitor: Connection reset by peer
...
20:57:08.648: 1234: debug : qemuMonitorTextCommandWithHandler:242 : Receive command reply ret=-1 rxLength=0 rxBuffer='(null)'
20:57:08.648: 1234: error : qemuMonitorTextGetPtyPaths:1960 : operation failed: failed to retrieve chardev info in qemu with 'info chardev'20:57:08.648: 1234: debug : qemuProcessWaitForMonitor:1170 : qemuMonitorGetPtyPaths returned -1
20:57:08.648: 1234: debug : qemuProcessStop:2801 : Shutting down VM 'ipcop-clone' pid=1347 migrated=0

I HAVE check /var is mounted with a "ls /var" in init/libvirt-bin.conf & libvirt-bin-storage.conf.
It is mounted ok.

Franck

well,
if it is not "/var/something"
there is also
"/dev/pts"

Something is wrong around the "qemuMonitor" routines.

find why qemuMonitorIORead:487 is triggered and the bug is closed !

I really don't understand the logic of 'upstart' to trace into it, sorry.
Openning a new bug on the same subject ? Why ?

Franck

****** when it is OK ********
20:57:10.438: 1234: debug : qemuMonitorTextCommandWithHandler:237 : Send command 'info chardev' for write with FD -1
....
20:57:10.439: 1234: debug : qemuMonitorTextCommandWithHandler:242 : Receive command reply ret=0 rxLength=109 rxBuffer='charmonitor: filename=unix:/var/lib/libvirt/qemu/ipcop.monitor,server
charserial0: filename=pty:/dev/pts/1'

******** when it is NOT ok **********
20:57:08.555: 1234: debug : qemuMonitorTextCommandWithHandler:237 : Send command 'info chardev' for write with FD -1
....
20:57:08.648: 1228: error : qemuMonitorIORead:487 : Unable to read from monitor: Connection reset by peer
20:57:08.648: 1228: debug : qemuMonitorIO:610 : Error on monitor Unable to read from monitor: Connection reset by peer
20:57:08.648: 1228: debug : qemuMonitorIO:644 : Triggering error callback
20:57:08.648: 1228: debug : qemuProcessHandleMonitorError:170 : Received error on 0x142f5a0 'ipcop-clone'
20:57:08.648: 1234: debug : qemuMonitorSend:811 : Send command resulted in error Unable to read from monitor: Connection reset by peer
...
20:57:08.648: 1234: debug : qemuMonitorTextCommandWithHandler:242 : Receive command reply ret=-1 rxLength=0 rxBuffer='(null)'
20:57:08.648: 1234: error : qemuMonitorTextGetPtyPaths:1960 : operation failed: failed to retrieve chardev info in qemu with 'info chardev'20:57:08.648: 1234: debug : qemuProcessWaitForMonitor:1170 : qemuMonitorGetPtyPaths returned -1
20:57:08.648: 1234: debug : qemuProcessStop:2801 : Shutting down VM 'ipcop-clone' pid=1347 migrated=0

I HAVE check /var is mounted with a "ls /var" in init/libvirt-bin.conf & libvirt-bin-storage.conf.
It is mounted ok.

Franck

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2011-07-25: Re: [Bug 495394] Re: autostart almost always fails on boot time host

#59

Quoting Franck78 (<email address hidden>):
> well,
> if it is not "/var/something"
> there is also
> "/dev/pts"
>
> Something is wrong around the "qemuMonitor" routines.
>
> find why qemuMonitorIORead:487 is triggered and the bug is closed !

Libvirt's monitor is trying to read from already-opened monitor fd.
Qemu has crashed, perhaps unable to find some backing store, perhaps for
some other reason. /var itself is ok - even it it had gotten
overmounted, libvirt is reading from an fd and the overmount wouldn't
matter.

We need to figure out why qemu crashed. There may be useful info in
/var/log/libvirt/qemu/ipcop-clone.log

> I really don't understand the logic of 'upstart' to trace into it, sorry.
> Openning a new bug on the same subject ? Why ?

Because yours has a different cause, and is therefore a different bug
with similar symptoms. Globbing the info with that from other bugs
makes it harder to cleanly reason about it and minimizes our chances of
finding the root cause.

summary:

- autostart almost always fails on boot time host
+ autostart fails on boot time host when network devices not ready

Revision history for this message

Gary Pope (gaz-6) wrote on 2013-10-11:

#60

comment #8 fixed me too, but I had to use a longer sleep value. I used sleep 8.
This remedied /etc/init.d/libvirt.bin under Debian v7.1.0 HOST starting a VM for Ubuntu 12.04-3

Gaz

Revision history for this message

Gary Pope (gaz-6) wrote on 2013-10-11:

#61

Sorry comment #60 was meant to be sleep 60 (60 seconds not 4, like comment #8)

Revision history for this message

Ruben Portier (rubenportier) wrote on 2016-11-13:

#62

I'm having a similar issue on Ubuntu 16.04 as host. When the guests are on autostart and I boot the host, I can see an error message on the guests' while booting: "Failed to start Raise network interfaces".

I've tried the methods above by adding a sleep to the init file. however, the init file /etc/init/libvirt-bin.conf is not used anymore, as echoing into a file does not work (seems like this init file is deprecated, then why is it still there?). I've found another file, /etc/init.d/libvirt-bin, which looks a lot newer, still it states the year 2007 at the top of the file. This file is completely different and I have no idea where to put the sleep command to test.

I'm not sure if this is the exact same issue, as I couldn't find any similar issues on the internet. I've tried changing my bridge configuration, without success. I have bridge_maxwait 0 on my bridge, but it does not help. The host is able to use the internet right after boot, so it seems the interface (bridge) is actually up and working. The guests are working for a short period of time after boot. I've noticed that the default route (IPv6) has a expire option on it and it's counting downwards. When the route expires, the guest is no longer reachable over the internet. When restarting the guests' interface, it's working again without any problems.

So, this seems like an issue with the bridge not completely ready when libvirt autostarts my guests. I have no idea why this happens and why it takes longer for the bridge to fully initialize. I hope someone can help me find out if this reported bug is related to my issue, or if I'm having a different one.

Thanks in advance!

Revision history for this message

Ruben Portier (rubenportier) wrote on 2016-11-19:

#63

Apparently, this issue was caused by the host having high utilisation on the CPU on host boot. This caused the guests to not have their interfaces configured in time. The link on an interface can be up while the interface itself is not yet fully initialised. This causes the networking-services to accept neighbor advertisements and router advertisements which can add IPv6 routes to the routing table.

When the interface is almost complete, it will set the default route as I had a gateway rule in my interfaces file. This fails (file exists) because there already is a default IPv6 route for this particular gateway, assigned via RA (router advertisement). To solve this issue, I simply removed the gateway from the interfaces file, as the gateway is auto assigned via RA. Another fix would be to disable accepting RA on this particular interface, the default value or all interfaces by using:

pre-up net.ipv6.conf.device.accept_ra=0

where "device" is "all", "default" or the actual device name (eth0, em0 etc.).

I hope this can help some people suffering from the same problem as I had. It took me way too long to find the cause of this problem. The actual fix was found on this link: http://unix.stackexchange.com/questions/306139/rtnetlink-answers-file-exists-after-adding-ipv6-address.

Ubuntu
libvirt package

autostart fails on boot time host when network devices not ready

Bug Description

Related branches

Other bug subscribers

Patches

Bug attachments

Remote bug watches

	Status	Importance	Assigned to
libvirt (Ubuntu)	Fix Released	High	Serge Hallyn
Lucid	Fix Released	Undecided	Unassigned
Maverick	Fix Released	Undecided	Unassigned
Natty	Fix Released	Undecided	Unassigned

Ubuntulibvirt package

autostart fails on boot time host when network devices not ready

Bug Description

Related branches

Other bug subscribers

Patches

Bug attachments

Remote bug watches

Ubuntu
libvirt package