apparmor denies related to nvdimms/nfit

Bug #1871354 reported by Christian Ehrhardt 
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
libvirt (Ubuntu)
Fix Released
High
Christian Ehrhardt 

Bug Description

[Impact]

 * Backport the apparmor rules that I upstreamed (have ack of JDstrand)
   to avoid nvdimm nitialization of libpmem (done always even if not used)
   to not spill Denials into the log all the time.
 * The upload also contains a backport of a CVE fix I was pointed to
   by mdeslaur

[Test Case]

 * Start qemu and check apparmor denials (pre tested on PPAs)
 * Check virsh interaction if any timeouts change (I've seen none in pre-
   tests)

[Regression Potential]

 * This is not adding new denials, only adding allows. Thereby the
   regression risk is minimal.
   If anything then allowing to read "/" itself would be disallowed in some
   environments, but according to jdstrand it is safe in any LSB compliant
   sytems and as always users that want extra isolation can add denial
   rules to local apparmor overrides.
 * The timeout is safe as well as it does not add/remove timeouts, instead
   it only forbids changing it via the read-only connection

[Other Info]

 * This isn't technically an SRU, but I have learned that filling these
   templates helps the release Team to accept changes while in 20.04 Freeze
   time.

---

On guest start I see:

apparmor="DENIED" operation="open" profile="libvirt-785b6ea8-24b9-4d9f-9e6e-6a08ac8a95ff" name="/"·
apparmor="DENIED" operation="open" profile="libvirt-785b6ea8-24b9-4d9f-9e6e-6a08ac8a95ff" name="/sys/bus/nd/devices/"

The latter could be allowed if we understand why it happens?
The former looks like a programming error and I'd want to know where it comes from exactly.

CMDline was
usr/bin/qemu-system-x86_64 \
-name guest=f-test1,debug-threads=on \
-S \
-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-10-f-test1/master-key.aes \
-machine pc-q35-focal,accel=kvm,usb=off,dump-guest-core=off \
-cpu qemu64 \
-m 4096 \
-overcommit mem-lock=off \
-smp 8,sockets=8,cores=1,threads=1 \
-uuid 2afb2039-c0a8-4408-9fa2-17e7f7488fc0 \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=31,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc \
-no-shutdown \
-boot strict=on \
-device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \
-device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \
-device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \
-device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 \
-device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 \
-device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 \
-device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 \
-device qemu-xhci,id=usb,bus=pci.2,addr=0x0 \
-device virtio-serial-pci,id=virtio-serial0,bus=pci.3,addr=0x0 \
-blockdev '{"driver":"file","filename":"/var/lib/uvtool/libvirt/images/x-uvt-b64-Y29tLnVidW50dS5jbG91ZC5kYWlseTpzZXJ2ZXI6MjAuMDQ6YW1kNjQgMjAyMDAzMzA=","node-name":"libvirt-3-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-3-format","read-only":true,"driver":"qcow2","file":"libvirt-3-storage","backing":null}' \
-blockdev '{"driver":"file","filename":"/var/lib/uvtool/libvirt/images/f-test1.qcow","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-2-format","read-only":false,"driver":"qcow2","file":"libvirt-2-storage","backing":"libvirt-3-format"}' \
-device virtio-blk-pci,scsi=off,bus=pci.4,addr=0x0,drive=libvirt-2-format,id=virtio-disk0,bootindex=1 \
-blockdev '{"driver":"file","filename":"/var/lib/uvtool/libvirt/images/f-test1-ds.qcow","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \
-device virtio-blk-pci,scsi=off,bus=pci.5,addr=0x0,drive=libvirt-1-format,id=virtio-disk1 \
-netdev tap,fd=33,id=hostnet0,vhost=on,vhostfd=34 \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:72:98:2c,bus=pci.1,addr=0x0 \
-chardev pty,id=charserial0 \
-device isa-serial,chardev=charserial0,id=serial0 \
-chardev socket,id=charchannel0,fd=36,server,nowait \
-device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 \
-vnc 127.0.0.1:0 \
-spice port=5901,addr=127.0.0.1,disable-ticketing,seamless-migration=on \
-device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pcie.0,addr=0x1 \
-device virtio-balloon-pci,id=balloon0,bus=pci.6,addr=0x0 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on

Tags: server-next

Related branches

CVE References

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

/sys/bus/nd should be NFIT-ND as described in https://lwn.net/Articles/640891/

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Debug approach #1 (Thanks amurray)

handle SIGUSR1 noprint nostop
set follow-fork-mode child
set $outside = 1
catch syscall openat
commands
  silent
  set $outside = ! $outside
  if ($outside && $rax >= 0)
    continue
  end
  if ( !$outside )
    continue
  end
  if ($rax == -1 || $rax == -13)
    printf "`openat` returned a negative value %d\n",$rax
    bt
  else
    continue
  end
end

This was always too late ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

It could be triggered right when loading the libpmem lib or similar.

From strace:

63706 0.000028 stat("/sys/bus/nd/devices", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 <0.000011>
63706 0.000050 stat("/sys/bus/nd/devices", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 <0.000009>
63706 0.000031 openat(AT_FDCWD, ".", O_RDONLY) = 3 <0.000011>
63706 0.000031 fchdir(3) = 0 <0.000007>
63706 0.000024 openat(AT_FDCWD, "/sys/bus/nd/devices", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4 <0.000010>

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (3.3 KiB)

As assumed libpmem init:

break open if strcmp($rdi,".") == 0

Breakpoint 1, __libc_open64 (file=file@entry=0x7ffff734be26 <dot> ".", oflag=oflag@entry=0) at ../sysdeps/unix/sysv/linux/open64.c:37
37 ../sysdeps/unix/sysv/linux/open64.c: No such file or directory.
(gdb) bt
#0 __libc_open64 (file=file@entry=0x7ffff734be26 <dot> ".", oflag=oflag@entry=0) at ../sysdeps/unix/sysv/linux/open64.c:37
#1 0x00007ffff72a7f4c in fts_open (argv=<optimized out>, argv@entry=0x7fffffffd990, options=options@entry=65, compar=compar@entry=0x0) at ../sysdeps/wordsize-64/../../io/fts.c:219
#2 0x00007ffff772fa4c in fs_new (path=path@entry=0x7ffff776877e "/sys/bus/nd/devices") at ../../src/../src/common/fs_posix.c:59
#3 0x00007ffff7731414 in os_auto_flush () at ../../src/../src/common/os_auto_flush_linux.c:187
#4 0x00007ffff7733449 in pmem_has_auto_flush () at pmem.c:218
#5 0x00007ffff77355d5 in pmem_init_funcs (funcs=funcs@entry=0x7ffff776e1a0 <Funcs>) at ../../src/../src/libpmem/x86_64/init.c:467
#6 0x00007ffff7734174 in pmem_init () at pmem.c:786
#7 0x00007ffff7fe0b8a in call_init (l=<optimized out>, argc=argc@entry=1, argv=argv@entry=0x7fffffffdb58, env=env@entry=0x7fffffffdb68) at dl-init.c:72
#8 0x00007ffff7fe0c91 in call_init (env=0x7fffffffdb68, argv=0x7fffffffdb58, argc=1, l=<optimized out>) at dl-init.c:30
#9 _dl_init (main_map=0x7ffff7ffe190, argc=1, argv=0x7fffffffdb58, env=0x7fffffffdb68) at dl-init.c:119
#10 0x00007ffff7fd013a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#11 0x0000000000000001 in ?? ()
#12 0x00007fffffffdf56 in ?? ()
#13 0x0000000000000000 in ?? ()

Catchpoint 2 (call to syscall openat), __GI___open64_nocancel (file=0x55555654eb30 "/sys/bus/nd/devices", oflag=oflag@entry=591872) at ../sysdeps/unix/sysv/linux/open64_nocancel.c:45
45 ../sysdeps/unix/sysv/linux/open64_nocancel.c: No such file or directory.
(gdb) bt
#0 __GI___open64_nocancel (file=0x55555654eb30 "/sys/bus/nd/devices", oflag=oflag@entry=591872) at ../sysdeps/unix/sysv/linux/open64_nocancel.c:45
#1 0x00007ffff7273ea7 in __opendir (name=<optimized out>) at ../sysdeps/posix/opendir.c:92
#2 0x00007ffff72a70ac in fts_build (sp=sp@entry=0x55555654eae0, type=type@entry=3) at ../sysdeps/wordsize-64/../../io/fts.c:636
#3 0x00007ffff72a82c6 in fts_read (sp=0x55555654eae0) at ../sysdeps/wordsize-64/../../io/fts.c:396
#4 0x00007ffff772faa0 in fs_read (f=f@entry=0x55555654eaa0) at ../../src/../src/common/fs_posix.c:77
#5 0x00007ffff7731430 in os_auto_flush () at ../../src/../src/common/os_auto_flush_linux.c:195
#6 0x00007ffff7733449 in pmem_has_auto_flush () at pmem.c:218
#7 0x00007ffff77355d5 in pmem_init_funcs (funcs=funcs@entry=0x7ffff776e1a0 <Funcs>) at ../../src/../src/libpmem/x86_64/init.c:467
#8 0x00007ffff7734174 in pmem_init () at pmem.c:786
#9 0x00007ffff7fe0b8a in call_init (l=<optimized out>, argc=argc@entry=1, argv=argv@entry=0x7fffffffdb58, env=env@entry=0x7fffffffdb68) at dl-init.c:72
#10 0x00007ffff7fe0c91 in call_init (env=0x7fffffffdb68, argv=0x7fffffffdb58, argc=1, l=<optimized out>) at dl-init.c:30
#11 _dl_init (main_map=0x7ffff7ffe190, argc=1, argv=0x7fffffffdb58, env=0x7fffffffdb68) at dl-init.c:119
#12 0x00007ffff7fd013a in...

Read more...

tags: added: server-next
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

On init amon other things libpmem will do:

161 /*
162 * os_auto_flush -- check if platform supports auto flush for all regions
163 *
164 * Traverse "/sys/bus/nd/devices" path to find all the nvdimm regions,
165 * then for each region checks if "persistence_domain" file exists and
166 * contains "cpu_cache" string.
167 * If for any region "persistence_domain" entry does not exists, or its
168 * context is not as expected, assume eADR is not available on this platform.
169 */

That will open "." and PWD for a libvirt executed qemu will be nothing => "/"
Followed by "/sys/bus/nd/devices"

But from the code I see that it expects there to be symlinks.
We will need the patterns those will follow to add rules for those as well.

TODO:
1. silence access to "/"
2. allow enumeration (read only) of
  /sys/bus/nd/devices r,
  /sys/bus/nd/devices/* r,
3. find where the symlinks usually point to and add these

Can we find a way to only add these when pmem is actually used?
In that case we want to silence #2 as well, but allow it if used

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Useful reference: /sys/devices/platform/{e820_pmem,nfit_test.*}/region*/persistence_domain r,
But for a fix I'd need to get access to a system with the real thing or detailed info about one.

Until then people that want to use it should allow it for "their setup" by adding to:
  /etc/apparmor.d/local/abstractions/libvirt-qemu

/sys/bus/nd/devices r,
/sys/bus/nd/devices/* r,
/sys/devices/platform/{e820_pmem,nfit_test.*}/ndbus[0-9]*/region[0-9]* r,
/sys/devices/platform/{e820_pmem,nfit_test.*}/ndbus[0-9]*/region[0-9]*/persistence_domain r,

This list might increase once we know a few real setups content in these paths.
Once we know that we can discuss if it is safe to allow that unconditionally or not.

P.S. We still would want to silence the denial until allowed.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

@Adam - I have subscribed you as I know you have worked on pmem. If on such a system (best with all "kinds" of nvdimm use cases set up) one could collect the full content of /sys so that I can track down the set of paths that we will need to allow overall.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Wow, this is very different ...

On a qemu system with faked nvdimms I got this:
ubuntu@focal-nvdimm:~$ ll /sys/bus/nd/devices/
total 0
drwxr-xr-x 2 root root 0 Apr 8 12:59 ./
drwxr-xr-x 4 root root 0 Apr 8 12:59 ../
lrwxrwxrwx 1 root root 0 Apr 8 12:59 btt0.0 -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region0/btt0.0/
lrwxrwxrwx 1 root root 0 Apr 8 12:59 btt1.0 -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region1/btt1.0/
lrwxrwxrwx 1 root root 0 Apr 8 12:59 dax0.0 -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region0/dax0.0/
lrwxrwxrwx 1 root root 0 Apr 8 12:59 dax1.0 -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region1/dax1.0/
lrwxrwxrwx 1 root root 0 Apr 8 12:59 namespace0.0 -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region0/namespace0.0/
lrwxrwxrwx 1 root root 0 Apr 8 12:59 namespace1.0 -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region1/namespace1.0/
lrwxrwxrwx 1 root root 0 Apr 8 12:59 ndbus0 -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/
lrwxrwxrwx 1 root root 0 Apr 8 12:59 nmem0 -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/nmem0/
lrwxrwxrwx 1 root root 0 Apr 8 12:59 nmem1 -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/nmem1/
lrwxrwxrwx 1 root root 0 Apr 8 12:59 pfn0.0 -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region0/pfn0.0/
lrwxrwxrwx 1 root root 0 Apr 8 12:59 pfn1.0 -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region1/pfn1.0/
lrwxrwxrwx 1 root root 0 Apr 8 12:59 region0 -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region0/
lrwxrwxrwx 1 root root 0 Apr 8 12:59 region1 -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region1/

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

libpmem would scan those two:
 region0 -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region0/
 region1 -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region1/

These faked entries have no "persistence_domain" entry.

I wonder:
 - how predictable those path namings are
   - this hit us on lib-init so we can't know anything about the device yet
 - how they would be on various HW types and setup variations

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

maybe
  /sys/devices/LNXSYSTM:[0-9][0-9]/LNXSYBUS[0-9][0-9]/ACPI*/ndbus[0-9]/region[0-9] r,
  /sys/devices/LNXSYSTM:[0-9][0-9]/LNXSYBUS[0-9][0-9]/ACPI*/ndbus[0-9]/region[0-9]/persistence_domain r,

But would those paths differ in other cases e.g. with real Hardware?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Also adding Jeff:
<andreas> cpaelzer: try jeff lane (https://directory.canonical.com/orgchart/Jeff%20Lane/), he was driving the ipmctl sru and he has nvdimms

Thanks Andreas for the hint

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

If conditional a fix (once we know the paths) would be an extension of these
 - https://libvirt.org/git/?p=libvirt.git;a=commit;h=ac254f342ff59b18792394ce9b01b1c8c2da7e28
 - https://libvirt.org/git/?p=libvirt.git;a=commit;h=999998a7920c11a3c8969bba6e32714ea810508c
Adding checks if </pmem> [1]: is enabled and then adding the necessary paths

But as mentioned before this is on libpmem init, at this time we can't rely on any device/config data (e.g. if hot added later). Therefore once we know the paths in question we need to discuss if they are safe enough to be added unconditionally to the base template.

P.S. still looking for a way to silence the denials until we want to allow.

[1]: https://libvirt.org/formatdomain.html#elementsMemory

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

...
[15:28] <cpaelzer> so adding a deny (btw I can only add to places users can't modify) they can't allow it later right?
[15:28] <jjohansen> cpaelzer: correct
[15:28] <cpaelzer> so how could I silence access to a given path
[15:28] <cpaelzer> but have the chance for user to later allow it?
...
[15:31] <jjohansen> unfortunately deny is the only way currently to control quieting atm
[15:31] <cpaelzer> ok, so for now I have to choose between a) the denial is slightly annoying now, but users can allow it later AND b) silencing it now but the few users that want to allow it will be unable to do so
[15:31] <cpaelzer> is that summary correct?
[15:31] <jjohansen> yes
[15:31] <cpaelzer> thanks

Due to the above I think I could make virt-aa-helper:
 - if a </pmem> is present add allowing the paths
 - if no </pmem> is present add a deny rule to silence the denials

The only case left open would be systems that start without any pmem and want to hot-add it later. Those should be pretty rare one would think (even rarer than nvdimms on its own already are).

Next steps:
 - get a full assortment of possible paths (I hope Adam or Jeff can help)

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

FYI, Christian sent up a patch and I responded to it here: https://www.redhat.com/archives/libvir-list/2020-April/msg00441.html

Revision history for this message
Adam Borowski (aborowski) wrote :

These days, a lot of file reads from /sys and /proc is wrapped inside ndctl rather than PMDK, so you'd need to review both. And, a good part of accesses happen only on true NVDIMMs -- although if I recall correctly, those are all under /sys/bus/nd/devices. There's also /sys/devices, /sys/dev/char/%u:%u/*, /proc/self/*, etc.

I'm not sure what paths need to be listed in AppArmor, but grepping through ndctl's and pmdk's sources would be a good start.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Upstream accepted as:
  8f61fd6b apparmor: avoid denials on libpmem initialization

Ready to get into Focal

Changed in libvirt (Ubuntu):
status: New → In Progress
assignee: nobody → Christian Ehrhardt  (paelzer)
importance: Undecided → High
description: updated
description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 6.0.0-0ubuntu7

---------------
libvirt (6.0.0-0ubuntu7) focal; urgency=medium

  * d/p/ubuntu-aa/lp-1871354*: fix apparmor denials on libpmem init
    (LP: #1871354)
  * d/p/ubuntu/CVE-CVE-2020-10701-api-disallow-virDomainAgentSetResponseTimeout
    -on-rea.patch: avoid DOS through read only connections
    CVE-2020-10701

 -- Christian Ehrhardt <email address hidden> Wed, 15 Apr 2020 12:29:12 +0200

Changed in libvirt (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.