> > non-sense. > > have you ever seen 'udevadm settle' take 120 seconds? Can you give > > an example of when it *could* do that? The only thing I can reasonably > > https://ask.fedoraproject.org/en/question/81006/slow-boot-time-udev-settle/ > https://askubuntu.com/questions/888010/about-systemd-udev-settle-service-and- > slow-booting > https://bugzilla.redhat.com/show_bug.cgi?id=735866 > https://bbs.archlinux.org/viewtopic.php?id=189369 > https://forums.opensuse.org/showthread.php/506828-Udev-3-minute-delay-on-boot > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=754987 > https://lists.debian.org/debian-user/2015/04/msg01592.html Every single example link above is in regard to slow boot times. Boot is a time when there is a flurry of udev events. curtin does not run in that time frame which is a *huge* outlier in the general life of a system. In order to get actual numbers on how long we spend time waiting for udev events that I think are important, I put up: https://code.launchpad.net/~smoser/curtin/+git/curtin/+merge/352791 A full vmtest run produced 13098 calls to udevadm settle over 257 installs. Of those calls, 1555 took over .1 seconds. The longest coming in during TrustyHWEXTestRaid5Bcache at 1.809 seconds. http://paste.ubuntu.com/p/t2k6tDw6x7/ Here is a log of total time spent in settle along with total time spent in 'curtin install'. http://paste.ubuntu.com/p/rNsqfpmxkr/ Admittedly the total time we spend "waiting" can be 10% of the time of our install, but I'd still argue that those waits are almost certainly valid. > Settle can block up to 120 seconds any time one or more events > that are in the queue when settle is called that do not get processed. > The event in queue may be unrelated to storage; it can be any event > in the system (net, block, usb); settle doesn't care. And you think that at the time frame that an installer is running there is likely a flurry of net or usb events coming in? Go ahead, find a busy system that is not booting and run 'udevadm monitor' See if you see a single event over the course of 10 minutes or so. I gave that a try, I ran 'lxc launch' on a lxd with zfs. I was honestly floored at how many events came through. (1163 UDEV , 1163 KERNEL). I ran a loop of 'time udevadm settle' during that, and it never took more than 1/100th of a second. > > come up with are disk based operations on large md or something. > > Those are exactly the reasons we want to settle. Anything else that > > was seemingly unrelated but is taking significant time honestly > > probably something important to an installer. I'd much rather > > see a log entry that showed a timeout failed on a settle than > > try to diagnose some related failure. > > Recall this discussion: > https://bugs.launchpad.net/cloud-init/+bug/1766287 > https://code.launchpad.net/~raharper/cloud-init/+git/cloud- > init/+merge/344198 Both of our responses on that bug is that we'd much rather have *stable* behavior than fast behavior. That is what I'm arguing for here. So I'm confused on your pointing to it. > My argument is that we should only call settle when we *need* it. > Extra calls to settle just because at best waste a few milliseconds, > at worse large amounts of latency to the install path. > > Calling settle when no events are in the queue doesn't > ensure safety. One needs to know when you've queued events (ie you've > created a new device) > and then calling settle is effective, and we use exit-if-exists to > keep us from waiting > for events that are not related to the device we created. > > We know when we need to call it. I accept that in some scenarios it may be possible for the caller of a function to desire to not have a udevadm settle called, because they know they are accounting for that at a higher level. I've suggested I'm fine with the option to pass in 'skip_settle' to different functions. I think that in general, the function should behave in a safe way for the caller. If the caller wants to shave off precious milliseconds then they can, but the default should be safety. > > I'd much prefer that any function that makes changes to disks > > settle by default. If the caller wants to *not* settle (because > > Note that lvm_scan does *not* create any devices. It *may* activate > lvs if some are present. If it can activate logical volumes, couldn't those logical volumes be part of a volume group which then created a device as a result of an event? There is a *ton* of magic like that in udev. > > they have a better way or more knowledge) then we can do that. > > The function is just more caller friendly if it doesn't have race > > conditions waiting to explode in the user's face. > > The assumption that we always need to settle is the core issue. > Callers always need to know whether they need to call settle, > and in curtin, we have that knowledge. Thats an argument for a python library that is only usable by uber elite people who hold all knowledge of multiple distributions, kernels, udev subsystem versions, hooks and other things in their head. Looking at 'git log', neither you nor I fit into that group. > Also where is the race? You keep pointing to the --exist-if-exists scenario > but that's unrelated to an lvm scan were we might activate an LV > of whom's path we don't know (and wouldn't) Are you sure that lvm_scan (which does pvscan followed by 'vgscan --mknodes') cannot possibly end up causing udev events that cause new volume groups that causes the creation of or deletion of new block devices? I just want to make functions we have like 'lvm_scan' generally usable. I do not want them to have pointy edges that bite us or other users of curtin. If you're certain that 'lvm_scan' by itself is safe then I'm fine to not have it call udevadm settle. If it is not safe then lets add a 'skip_settle' argument and the caller can call it.