libvirt-daemon error "virHashSearch:727 : Hash operation not allowed during iteration"

Bug #1789659 reported by Silvio Moioli
30
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Fix Released
Undecided
Unassigned
Pike
Won't Fix
Undecided
Unassigned
libvirt (Ubuntu)
Fix Released
Undecided
Unassigned
Bionic
Fix Released
High
Christian Ehrhardt 

Bug Description

[Impact]

 * Libvirt does an overzealous check on concurrent hash usage which breaks
   some automation like for example Terraform. Upstream moved the need to
   lock up the stack where applicable and dropped the checks as they were
   superfluous.

 * Backport the upstream change dropping the extra checks on hash usage

[Test Case]

 * Run the attached script test-lp1789659.sh
  (based on ideas of https://bugzilla.redhat.com/show_bug.cgi?id=1581364#c7)

With the aforementioned patch, this error is not produced any more.

[Regression Potential]

 * If the assumption that all remaining callers are safe isn't correct
   there could be code using invalid hashes accessing guest information.
   This needs thorough tests and regression checks as well as an extra
   review if it can be applied to the libvirt 4.0 we have in Bionic.
   This was done by me and also upstream did the same.
   Quote from the patch: "Most important uses seem to be covered. Callers
   have now a greater responsability, notably the ability to execute some
   operations while iterating were reliably forbidden before are now
   accepted."
   It is important to see that the internal locking was inherently racy
   (boolean) so we are replaceing a broken lock-ckeck with no check as
   upper layers were supposed to do it correctly. That is true for what is
   in the archive but there could be out-of archive things not doing so
   calling directly into the python api bindings for example.
   Never the less those are today just as affected by the locking being
   broken, so those do not "loose" a lot.
   All major exploiters we know of are good - the rest is a regression
   risk we are willing to take in favor of fixing all the valid use cases
   being affected by the broken lock handling.

[Other Info]

 * n/a

----

In the SUSE Manager/Uyuni[1] project CIs we make use of Terraform[2], a tool to automate infrastructure deployments, together with terraform-provider-libvirt[3], a Terraform provider (plugin) that allow interaction with libvirt.

By default, this setup will issue several libvirt API calls concurrently in an hard-to-predict order, as demonstrated by logs that I am attaching.

In the "bad log" example, the following line appears:

error : virHashSearch:727 : Hash operation not allowed during iteration

According to analysis of a similar problem by Red Hat in an OpenStack scenario[4], this is has been fixed in upstream libvirt via commit 4d7384eb9ddef2008cb0cc165eb808f74bc83d6b [5].

I tested the patch and it applies cleanly to the 4.0.0 package shipping with Bionic and that successfully resolves this issue.

Please evaluate including this patch.

Thanks in advance

[1] http://uyuni-project.org/
[2] https://www.terraform.io/
[3] https://github.com/dmacvicar/terraform-provider-libvirt
[4] https://bugzilla.redhat.com/show_bug.cgi?id=1576464#c3
[5] https://github.com/libvirt/libvirt/commit/4d7384eb9ddef2008cb0cc165eb808f74bc83d6b.patch

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: libvirt-daemon 4.0.0-1ubuntu8.3
ProcVersionSignature: Ubuntu 4.15.0-33.36-generic 4.15.18
Uname: Linux 4.15.0-33-generic x86_64
NonfreeKernelModules: nvidia_modeset nvidia
ApportVersion: 2.20.9-0ubuntu7.2
Architecture: amd64
CurrentDesktop: ubuntu:GNOME
Date: Wed Aug 29 16:05:10 2018
InstallationDate: Installed on 2014-06-12 (1539 days ago)
InstallationMedia: Ubuntu 14.04 LTS "Trusty Tahr" - Release amd64 (20140417)
SourcePackage: libvirt
UpgradeStatus: Upgraded to bionic on 2018-05-02 (119 days ago)

Revision history for this message
Silvio Moioli (silvio-moioli) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

This is upstram since 4.3 and thereby fixed in Cosmic.
Haven't had the time to look in depth and consider an SRU for Bionic ...

Changed in libvirt (Ubuntu):
status: New → Fix Released
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

It essentially removes a check for concurrency.
At the time it was added it was considered safe, but never the less in 4.0 (Bionic) there could be races left - just as much as there could be races not yet detected upstream.
I'd think this has a higher than usual regression risk.

OTOH it applies clean and I haven't found a revert or follow on fix so far (a few months later).

Well, in detail it is replacing a broken locking (by a boolean) with delegating the responsibility to callers which are already considered safe.

I think it can be done, but depends on the SRU Teams opinion eventually.
To Drive an SRU I'd need steps to reproduce, do you have something to share that is better than "deploy a huge thing with terraform with those 20.000 commands"?
I assume for your testing you might already have simplified it.

I'll add the empty SRU Template and ask you to fill as much as possible, but as mentioned the steps to reproduce would be most important IMHO.

description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Incomplete for now waiting for help on repro steps for latter SRU process.

Changed in libvirt (Ubuntu Bionic):
status: New → Incomplete
Revision history for this message
Silvio Moioli (silvio-moioli) wrote :

I have just verified that the reproducer for Red Hat Enterprise Linux is valid for Bionic:

1.Start a guest:
$ virsh start test1

2.Do 'virsh list' in a loop:
$ for i in {1..1000}; do virsh list; done

3.Open another terminal, do 'virsh domstats' in a loop:
$ for i in {1..1000}; do virsh domstats; done

4.Check the libvirtd.log:
$ cat /var/log/libvirt/libvirtd.log | grep -i "Hash operation not allowed during"
2018-09-04 06:57:00.761+0000: 28687: error : virHashForEach:597 : Hash operation not allowed during iteration
2018-09-04 06:57:00.773+0000: 28684: error : virHashForEach:597 : Hash operation not allowed during iteration
2018-09-04 06:57:02.040+0000: 28687: error : virHashForEach:597 : Hash operation not allowed during iteration
2018-09-04 06:57:06.567+0000: 28687: error : virHashForEach:597 : Hash operation not allowed during iteration
2018-09-04 06:57:09.761+0000: 28684: error : virHashForEach:597 : Hash operation not allowed during iteration

https://bugzilla.redhat.com/show_bug.cgi?id=1581364#c7

With the aforementioned patch, this error is not produced any more.

Revision history for this message
Silvio Moioli (silvio-moioli) wrote :

JFI: SUSE Linux Enterprise and openSUSE variants also added this patch to respective packages.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The reproduce is all I needed - thanks a lot!

There are quite a few things in flight atm SRUing to Bionic eventually.
I'll take a look at this reproducing and making it available as soon as those other things resolve.

Changed in libvirt (Ubuntu Bionic):
status: Incomplete → Triaged
assignee: nobody →  Christian Ehrhardt  (paelzer)
description: updated
Changed in libvirt (Ubuntu Bionic):
importance: Undecided → High
Revision history for this message
Anish Gupta (anish70) wrote :

Christian,

We are also seeing these errors on our Guests:

Sep 1 14:40:31 libvirtd[40608]: 2018-09-01 21:40:31.960+0000: 40662: error : virHashSearch:727 : Hash operation not allowed during iteration

..

Sep 1 17:59:58 libvirtd[4181]: 2018-09-02 00:59:58.703+0000: 4181: error : virNetlinkEventCallback:700 : nl_recv returned with error: No buffer space available
..

Looking for BIONIC SRU that has this fix.

thanks,
Anish

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I had some more reports on this on IRC and per mail, so I set prio high.

description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I did a code review to check for potential changes between 4.0 (what we have in Bionic) and 4.3 (when upstream created the fix) but found none that would make this change non-backportable (I was afraid of unrelated commits being required to make this safe - therefore the extra dive into the src).

Now prepping the actual Backport for a automated regression check.

Changed in libvirt (Ubuntu Bionic):
status: Triaged → In Progress
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Extended the regression considerations by some content from the commit and discussions I found.
It is safe IMHO, but I need to explain the SRU team why.

description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Can appear in two forms (maybe more):
  virHashSearch:727 : Hash operation not allowed during iteration
  virHashForEach:597 : Hash operation not allowed during iteration

Turns out one of my test nodes had hit this in June and August once, just was not fatal to the tests that ran.

description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

PPA: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3404

Tested and verified the proposed fix working as reported on similar tests done before.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I ran a set of regression tests - all looking good.
Summary of the x86 test (others the same):
stage0-prep: Pass 4 Failed 0 Skip 0 + 0 - RC 0 in 13 minutes
stage1-migrate: Pass 276 Failed 0 Skip 0 + 0 - RC 0 in 95 minutes
stage2-cross: Pass 22 Failed 0 Skip 0 + 1 - RC 0 in 35 minutes
stage3-misc: Pass 103 Failed 0 Skip 0 + 0 - RC 0 in 45 minutes
stage4-qemu: Pass 1 Failed 0 Skip 0 + 0 - RC 0 in 41 minutes

Also the bileto driven autopkgtests look good (only debci which is force-badtest on britney).
=> https://bileto.ubuntu.com/excuses/3404/bionic.html

Pushing for the SRU teams consideration ...

Revision history for this message
Andy Whitcroft (apw) wrote : Please test proposed package

Hello Silvio, or anyone else affected,

Accepted libvirt into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/libvirt/4.0.0-1ubuntu8.5 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in libvirt (Ubuntu Bionic):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-bionic
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (4.6 KiB)

Pre update test:
$ sudo ./test-lp1789659.sh
+ concurrency=40
+ duration=60
+ guests=4
+ uvt-simplestreams-libvirt sync --source http://cloud-images.ubuntu.com/daily arch=amd64 release=bionic label=daily
+ date +%F %T
+ starttime=2018-09-10 10:16:58
+ set +x
+ sleep 60s
uvt-kvm: error: libvirt: storage volume 'b1.qcow' exists already
+ date +%F %T
+ endtime=2018-09-10 10:17:58
+ pkill -P 5261
+ seq 1 4
+ uvt-kvm destroy b1
+ uvt-kvm destroy b2
+ uvt-kvm destroy b3
+ /bin/true
+ uvt-kvm destroy b4
+ /bin/true
+ journalctl -u libvirtd --since 2018-09-10 10:16:58 --until 2018-09-10 10:17:58
+ grep Hash operation
Sep 10 10:16:58 node-horsea libvirtd[3115]: 2018-09-10 10:16:58.914+0000: 3277: error : virHashForEach:597 : Hash operation not allowed during iteration
Sep 10 10:16:59 node-horsea libvirtd[3115]: 2018-09-10 10:16:59.326+0000: 3278: error : virHashForEach:597 : Hash operation not allowed during iteration
[...] ~80 of them

Update:
$ sudo apt install libvirt-clients libvirt-daemon libvirt-daemon-driver-storage-rbd libvirt-daemon-system libvirt0
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
  linux-headers-4.15.0-32 linux-headers-4.15.0-32-generic linux-image-4.15.0-32-generic linux-modules-4.15.0-32-generic linux-modules-extra-4.15.0-32-generic
Use 'sudo apt autoremove' to remove them.
Suggested packages:
  libvirt-daemon-driver-storage-gluster libvirt-daemon-driver-storage-sheepdog libvirt-daemon-driver-storage-zfs numad radvd auditd systemtap nfs-common zfsutils pm-utils
The following packages will be upgraded:
  libvirt-clients libvirt-daemon libvirt-daemon-driver-storage-rbd libvirt-daemon-system libvirt0
5 upgraded, 0 newly installed, 0 to remove and 7 not upgraded.
Need to get 4.115 kB of archives.
After this operation, 4.096 B disk space will be freed.
Get:1 http://archive.ubuntu.com/ubuntu bionic-proposed/main amd64 libvirt-daemon-system amd64 4.0.0-1ubuntu8.5 [80,7 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic-proposed/main amd64 libvirt-clients amd64 4.0.0-1ubuntu8.5 [595 kB]
Get:3 http://archive.ubuntu.com/ubuntu bionic-proposed/main amd64 libvirt-daemon-driver-storage-rbd amd64 4.0.0-1ubuntu8.5 [15,4 kB]
Get:4 http://archive.ubuntu.com/ubuntu bionic-proposed/main amd64 libvirt-daemon amd64 4.0.0-1ubuntu8.5 [2.173 kB]
Get:5 http://archive.ubuntu.com/ubuntu bionic-proposed/main amd64 libvirt0 amd64 4.0.0-1ubuntu8.5 [1.251 kB]
Fetched 4.115 kB in 1s (4.609 kB/s)
Preconfiguring packages ...
(Reading database ... 197154 files and directories currently installed.)
Preparing to unpack .../libvirt-daemon-system_4.0.0-1ubuntu8.5_amd64.deb ...
Unpacking libvirt-daemon-system (4.0.0-1ubuntu8.5) over (4.0.0-1ubuntu8.4) ...
Preparing to unpack .../libvirt-clients_4.0.0-1ubuntu8.5_amd64.deb ...
Unpacking libvirt-clients (4.0.0-1ubuntu8.5) over (4.0.0-1ubuntu8.4) ...
Preparing to unpack .../libvirt-daemon-driver-storage-rbd_4.0.0-1ubuntu8.5_amd64.deb ...
Unpacking libvirt-daemon-driver-storage-rbd (4.0.0-1ubuntu8.5) over (4.0.0-1ubuntu8.4) ...
Preparing to unpack .../libvirt-daemon_4.0.0-1ubuntu8.5_amd64.deb ...

Read more...

tags: added: verification-done verification-done-bionic
removed: verification-needed verification-needed-bionic
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for libvirt has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 4.0.0-1ubuntu8.5

---------------
libvirt (4.0.0-1ubuntu8.5) bionic; urgency=medium

  * d/p/ubuntu-aa/lp-1788603-fix-ptrace-rules-with-kernel-4.18.patch:
    avoid issues with newer kernels >=4.18 (LP: #1788603)
  * d/p/ubuntu/lp-1789659-don-t-check-for-parallel-iteration-in-hash.patch:
    remove broken and redundant check for parallel iteration in hash functions
    (LP: #1789659)

 -- Christian Ehrhardt <email address hidden> Tue, 28 Aug 2018 07:26:19 +0200

Changed in libvirt (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Michael Craft (michaelgcraft) wrote :

Will this be back ported to xenial libvirt 3.6.x?

Revision history for this message
Anish Gupta (anish70) wrote :

Does this also fix comment#8 ?

thanks,
Anish

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

@Anish - it fixes the first of your messages in comment #8
The second is yet unknown/undebugged to me - in case you hit that still regularly I'd need an extra bug with much more information. The info in this bug here was about the hash issue that is now fixed.

@Michael - I had no intention to bring that to 3.6 as that was in Ubuntu Artful which is now out of service. Are you referring to the 3.6 version that is in UCA [1] Pike for Xenial?
I'm adding a task for the cloud archive to consider picking this up as well, but will set it to incomplete. If this really is the "target" you have thought of set it to "new".

[1]: https://wiki.ubuntu.com/OpenStack/CloudArchive#Queens

Changed in cloud-archive:
status: New → Incomplete
Revision history for this message
GOVINDA TATTI (gtatti) wrote :

Christian,

We still see the following error message even with the latest bionic libvirt packages.

Message:
========
Sep 27 14:25:47 xpl-dvt-41 libvirtd[4927]: 2018-09-27 21:25:47.387+0000: 4927: error : virNetlinkEventCallback:700 : nl_recv returned with error: No buffer space available

Package:
=======
lab@xpl-dvt-35:~$ dpkg -l | grep libvirt
ii gir1.2-libvirt-glib-1.0:amd64 1.0.0-1 amd64 GObject introspection files for the libvirt-glib library
ii libsys-virt-perl 4.0.0-1 amd64 Perl module providing an extension for the libvirt library
ii libvirt-bin 4.0.0-1ubuntu8.5 amd64 programs for the libvirt library
ii libvirt-clients 4.0.0-1ubuntu8.5 amd64 Programs for the libvirt library
ii libvirt-daemon 4.0.0-1ubuntu8.5 amd64 Virtualization daemon
ii libvirt-daemon-system 4.0.0-1ubuntu8.5 amd64 Libvirt daemon configuration files
ii libvirt-glib-1.0-0:amd64 1.0.0-1 amd64 libvirt GLib and GObject mapping library
ii libvirt0:amd64 4.0.0-1ubuntu8.5 amd64 library for interfacing with different virtualization systems
ii python-libvirt 4.0.0-1 amd64 libvirt Python bindings

As you requested, we file a separate bug for it. Thanks.

Revision history for this message
Michael Craft (michaelgcraft) wrote :

@Christian

"Are you referring to the 3.6 version that is in UCA [1] Pike for Xenial?"

That is correct. We are leveraging that repo for pike deployment on xenial today. (OSA Pattern)

Flagged to new. thanks in advance.

Changed in cloud-archive:
status: Incomplete → New
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

@Govinda - the remaining issue you mention is unrelated to the one we discussed and fixed here.
For everybody else that wants to follow, Govinda file bug 1794997 about that (Thanks).

@Marc - ok then all our bug tagging is correct, lets see what they say.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Apologies, pike is EOL and this is fixed in bionic.

Changed in cloud-archive:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.