Comment 53 for bug 802626

Revision history for this message
Herton R. Krzesinski (herton) wrote :

I also reproduced the issue here, after being pointed out that some people started seeing frequently this with the 3.0.0-14 kernel stable update in Oneiric.

After debugging, I reached same conclusions as posted here. It's a "deadlock" between vgchange and udev exiting: vgchange after the resume dm ioctl keeps waiting for an semaphore to be "unlocked" (reach zero), which happens when the kernel sends the DM_COOKIE from the ioctl back to userspace (udev), and udev runs the dmsetup udevcomplete for the same cookie, which drops the semaphore count and makes vgchange go on. But if udev is exiting in initramfs before the kernel cookie event is sent, it ignores any later kernel events or only accepts the firmware loading events (if you are running udev from updates where Andy fixed the firmware request problem on udev and it exiting).

This dm_cookie stuff is the udev synchronization process in lvm/dm, and that's why disabling it makes the problem not happen anymore, as it doesn't rely anymore on DM_COOKIE event returned from the kernel, as already stated.

I think watershed makes a difference only in changing the timing of when things run, I wasn't able to see any problem with it.

Perhaps running with vgchange with --noudevsync only inside initramfs would be an acceptable workaround, if no synchronization is needed for all initramfs cases (it seems the case). Being system wide doesn't make sense, as udev is always on managing device nodes, and we would have bugs I expect we could have problems on node management or with other users.

But may be it's safer to just make udev process remaining DM_COOKIE events from the kernel as we already do with udev from updates in oneiric for example, where we already have a special case for timely events (events with TIMEOUT set - firmware loading). I'm proposing the following patch as a solution, it works well so far here, no need to disable udev synchronization anymore.