Confined processes inside container cannot fully access host pty device passed in by lxc exec

Bug #1641236 reported by Tyler Hicks
58
This bug affects 10 people
Affects Status Importance Assigned to Milestone
apparmor (Ubuntu)
Confirmed
Undecided
Unassigned
lxd (Ubuntu)
Invalid
Undecided
Unassigned
tcpdump (Ubuntu)
Confirmed
High
Unassigned

Bug Description

Now that AppArmor policy namespaces and profile stacking is in place, I noticed odd stdout buffering behavior when running confined processes via lxc exec. Much more data stdout data is buffered before getting flushed when the program is confined by an AppArmor profile inside of the container.

I see that lxd is calling openpty(3) in the host environment, using the returned fd as stdout, and then executing the command inside of the container. This results in an AppArmor denial because the file descriptor returned by openpty(3) originates outside of the namespace used by the container.

The denial is likely from glibc calling fstat(), from inside the container, on the file descriptor associated with stdout to make a decision on how much buffering to use. The fstat() is denied by AppArmor and glibc ends up handling the buffering differently than it would if the fstat() would have been successful.

Steps to reproduce (using an up-to-date 16.04 amd64 VM):

Create a 16.04 container
$ lxc launch ubuntu-daily:16.04 x

Run tcpdump in one terminal and generate traffic in another terminal (wget google.com)
$ lxc exec x -- tcpdump -i eth0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
<Packet dump>
47 packets captured
48 packets received by filter
1 packet dropped by kernel
<ctrl-c>

Note that everything above <Packet dump> was printed immediately because it was printed to stderr. <Packet dump>, which is printed to stdout, was not printed until you pressed ctrl-c and the buffers were flushed thanks to the program terminating. Also, this AppArmor denial shows up in the logs:

audit: type=1400 audit(1478902710.025:440): apparmor="DENIED" operation="getattr" info="Failed name lookup - disconnected path" error=-13 namespace="root//lxd-x_<var-lib-lxd>" profile="/usr/sbin/tcpdump" name="dev/pts/12" pid=15530 comm="tcpdump" requested_mask="r" denied_mask="r" fsuid=165536 ouid=165536

Now run tcpdump unconfined and take note that <Packet dump> is printed immediately, before you terminate tcpdump. Also, there are no AppArmor denials.
$ lxc exec x -- aa-exec -p unconfined -- tcpdump -i eth0
...

Now run tcpdump confined but in lxc exec's non-interactive mode and note that <Package dump> is printed immediately and no AppArmor denials are present. (Looking at the lxd code in lxd/container_exec.go, openpty(3) is only called in interactive mode)
$ lxc exec x --mode=non-interactive -- tcpdump -i eth0
...

Applications that manually call fflush(stdout) are not affected by this as manually flushing stdout works fine. The problem seems to be caused by glibc not being able to fstat() the /dev/pts/12 fd from the host's namespace.

Revision history for this message
Tyler Hicks (tyhicks) wrote :

There's currently no way in the AppArmor policy language to allow the getattr operation on the passed in /dev/pts/12 file. The typical workaround of adding the attach_disconnected flag to the profile does not work here because *every* AppArmor profile inside of the container would need that flag.

John Johansen has an AppArmor feature thought-out that would allow the policy language to allow this fd passing between namespaces but it is a sizeable feature and is not on the immediate roadmap.

I haven't had a chance to think it through very much but I'm curious if the LXD developers have any ideas on how this can be solved in LXD. Maybe it is possible to call openpty() from inside the container's namespace? I'm not sure if that would work or if it is safe to do but maybe it is worth investigating.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Getting openpty called in the container would solve a lot of problems for us but it's not possible to do in a safe way as it'd effectively rely on the container's filesystem which the container user can change or fake at will, allowing for attacks on the host's C library and LXD itself.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Marking the LXD side of this as Invalid since there's unfortunately nothing we can really do about this.

Changed in lxd (Ubuntu):
status: New → Invalid
Revision history for this message
Christian Brauner (cbrauner) wrote : Re: [Bug 1641236] Re: Confined processes inside container cannot fully access host pty device passed in by lxc exec

I've reproduced this on a fresh standard xenial instance with LXD
2.0.8 and also on a xenial instance with a patched glibc that reports
ENODEV on ttyname{_r}() on a pty fd that does not exist:

root@x:~# ./enodev_on_pty_in_different_namespace
ttyname(): The pty device might exist in a different namespace: No such device
ttyname_r(): The pty device might exist in a different namespace: No such device

Revision history for this message
Christian Brauner (cbrauner) wrote :

On Tue, Jan 31, 2017 at 11:34:43AM +0100, Christian Brauner wrote:
> I've reproduced this on a fresh standard xenial instance with LXD
> 2.0.8 and also on a xenial instance with a patched glibc that reports
> ENODEV on ttyname{_r}() on a pty fd that does not exist:
>
> root@x:~# ./enodev_on_pty_in_different_namespace
> ttyname(): The pty device might exist in a different namespace: No such device
> ttyname_r(): The pty device might exist in a different namespace: No such device

So to make this a little more elaborate:
- I managed to reproduce this with an unpatched glibc inside and outside the
  container just like @Tyler outlined.
- I managed to reproduce this with a patched glibc inside the container and an
  unpatched glibc outside the container.
- I managed to reproduce this with a patched glibc inside and outside the
  container.

So a patched glibc which returns ENODEV in case a symlink like /proc/self/fd/0
points to a pts device that lives in another namespace does not improve the
situation. The problem that @Tyler outlined still exists.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in apparmor (Ubuntu):
status: New → Confirmed
Revision history for this message
Thomas Parrott (tomparrott) wrote :

I've been able to re-create this using fresh install of Ubuntu 18.04 without using LXC or LXD, but just using network namespaces.

Setup 2 namespaces with IPVLAN:

ip netns add ns1
ip link add name ipv1 link enp0s3 type ipvlan mode l3s
ip link set dev ipv1 netns ns1
ip netns exec ns1 ip addr add 10.1.20.252/32 dev ipv1
ip netns exec ns1 ip link set ipv1 up
ip netns exec ns1 ip link set lo up
ip netns exec ns1 ip -4 r add default dev ipv1

ip netns add ns2
ip link add name ipv2 link enp0s3 type ipvlan mode l3s
ip link set dev ipv2 netns ns2
ip netns exec ns2 ip addr add 10.1.20.253/32 dev ipv2
ip netns exec ns2 ip link set ipv2 up
ip netns exec ns2 ip link set lo up
ip netns exec ns2 ip -4 r add default dev ipv2

Enter namespace 1 and start a ping to other namespace:

sudo ip netns exec ns1 ping 10.1.20.253

Then run tcpdump in namespace 2 listening for all packets without DNS resolution:

sudo ip netns exec ns2 tcpdump -i any -nn

This doesn't output any captured packets.

However running tcpdump with -l (Make stdout line buffered) does help:

sudo ip netns exec ns2 tcpdump -i any -nn -l

Revision history for this message
poobalan.arumugam aka murphy (poobalan-arumugam) wrote :

This affects Ubuntu 18.04 LXD containers as well.
As per previous mentions for tcpdump:
a) using script does not change anything
b) connecting via ssh and not lxc exec has no effect
c) disabling apparmour for tcpdump does work:
i.e.

/bin/ln -s /etc/apparmor.d/usr.sbin.tcpdump /etc/apparmor.d/disable/
/sbin/apparmor_parser -R /etc/apparmor.d

Revision history for this message
Walter (wdoekes) wrote :

Ahyes, that fixes things on Ubuntu/Jammy as well:

mkdir -p /etc/apparmor.d/disable
# (did not exist, over here)

ln -s /etc/apparmor.d/usr.bin.tcpdump /etc/apparmor.d/disable
# (note, no sbin, but bin)

apparmor_parser -R /etc/apparmor.d
# (this is indeed needed, instead of an apparmor restart)

Revision history for this message
Christian Boltz (cboltz) wrote :

> apparmor_parser -R /etc/apparmor.d

-R means to unload profiles, in this case all profiles in /etc/apparmor.d/. That's probably a bit ;-) too much...

I'd guess you want to unload only the tcpdump profile, which would be done with
apparmor_parser -R /etc/apparmor.d/usr.bin.tcpdump

An alternative would be to use aa-remove-unknown (run it with -n to see what it would unload).

Revision history for this message
Trent Lloyd (lathiat) wrote :

Note: This affects SSH as well.. not only lxc exec. There is a currently marked duplicate bug about the SSH part in Bug #1667016

This still persits on focal now. To workaround this for me I have to *both* use tcpdump with -l (line buffered mode) *and* pipe the output to cat. You also want to redirect stderr otherwise it's silently lost.

# tcpdump -lni o-hm0 2>&1|cat

The apparmor errors I get are:
[ 6414.508990] audit: type=1400 audit(1666764106.013:360): apparmor="DENIED" operation="file_inherit" namespace="root//lxd-juju-5062b7-2-lxd-3_<var-snap-lxd-common-lxd>" profile="/usr/sbin/tcpdump" name="/dev/pts/2" pid=187936 comm="tcpdump" requested_mask="wr" denied_mask="wr" fsuid=1000000 ouid=1001000

I have determined the cause, which is that tcpdump is one of the few programs with its own restrictive apparmor profile (/etc/apparmor.d/usr.sbin.tcpdump). As part of that it locks down /dev to read-only:
> /dev/ r,

However that also means /dev/pts is read-only, hence the error above denies write access.

There is an abstraction #include <abstractions/consoles> which adds access to /dev/pts and other console outputs. It's included also in the profile for usr.bin.man.

Including this abstraction resolves the issue for me. I'll upload a patch.

Changed in apparmor (Ubuntu):
status: Confirmed → Invalid
Changed in tcpdump (Ubuntu):
status: New → Confirmed
Trent Lloyd (lathiat)
Changed in tcpdump (Ubuntu):
importance: Undecided → High
Trent Lloyd (lathiat)
Changed in apparmor (Ubuntu):
status: Invalid → Confirmed
Revision history for this message
Trent Lloyd (lathiat) wrote :

The above analysis is true for SSH, but, I realise it's different for the PTY passed in by lxc exec.

So my analysis is true maybe, but I am going to move this SSH fix over to Bug #1667016 so this bug can stay open for the general PTY/buffering issue.

There is a gap in my explanation of: It's not clear to me why this doesn't also happen outside of a container.

Of note I found that the error I get initially suggests it couldn't resolve the path of the FD, which seems probably to be /dev/pts:
[ 9119.221342] audit: type=1400 audit(1666766810.741:452): apparmor="DENIED" operation="getattr" info="Failed name lookup - disconnected path" error=-13 namespace="root//lxd-juju-5062b7-2-lxd-3_<var-snap-lxd-common-lxd>" profile="/usr/sbin/tcpdump" name="apparmor/.null" pid=257511 comm="tcpdump" requested_mask="r" denied_mask="r" fsuid=1000108 ouid=0

However the same fix makes this go away. Is apparmor or this error message failing to identify the path for some reason because it has no permission to stat it in that apparmor context or something? Also is "/dev r" a faulty permission?

It's notable that after I reload the apparmor profile, and sometimes randomly, the current terminal session has this issue go away - it seems it can resolve the path for a while. e.g. if i add and then remove the consoles abstraction, it suddenly works inside that session. But if I logout/login it breaks again.

I'm a bit lost here :)

Revision history for this message
Christian Boltz (cboltz) wrote :

A few comments and explanations:

> As part of that it locks down /dev to read-only:
> /dev/ r,
>
> However that also means /dev/pts is read-only, hence the error above denies write access.

The rule for /dev/ only allows reading the directory listing of /dev/. It doesn't say or allow anything for /dev/pts/.

> [ 9119.221342] audit: type=1400 audit(1666766810.741:452): apparmor="DENIED" operation="getattr" info="Failed name lookup - disconnected path" error=-13 [...]
> name="apparmor/.null"

name="apparmor/.null" is a special replacement - IIRC for files or open file handles that should not be available inside the namespace. However, I'm not completely sure about this - maybe someone else can add a better explanation.

> Also is "/dev r" a faulty permission?

Not really faulty, just useless ;-)
This rule would allow to read the _file_ /dev, but since /dev is a directory, the rule doesn't give you any useful permissions.

You probably want "/dev/ r," which allows to read the directory listing (think of "ls -l /dev").

> It's notable that after I reload the apparmor profile, and sometimes randomly, the current terminal
> session has this issue go away - it seems it can resolve the path for a while. e.g. if i add and
> then remove the consoles abstraction, it suddenly works inside that session. But if I logout/login
> it breaks again.

Wild guess: are the lxd-related profiles autogenerated, and get overwritten when you logout/login or for other reasons?

Besides that, you could in theory hit a cache issue (even if it sounds unlikely in this case - changing a profile should also update its timestamp). If in doubt, check audit.log or syslog when the profile gets reloaded. You should see different messages if the profile loaded into the kernel actually changed or not.

Revision history for this message
John Johansen (jjohansen) wrote :

name="apparmor/.null" is used to remove access to an already open file that is being inherited and should no longer be available. Whether because policy doesn't allow it. AppArmor can't just close the file because the fd for the file might have meaning and closing the file would free up the fd slot and it could then be filled by a new open which could cause all kinds of weird issues.

lxd does auto generate profiles. So that is a good bet as to what is happening.

Revision history for this message
Georgia Garcia (georgiag) wrote :

I tried reproducing the issue on a 22.04 VM with a 22.04 container and I got some weird behavior, not consistent to what was reported in the comments, so I appreciate if anyone can also take a look.

What I found is that I can only reproduce the issue when running tcpdump in --mode=non-interactive, regardless of AppArmor - I also didn't see any AppArmor denials in the logs when running the test.

I have pasted my steps in https://pastebin.canonical.com/p/8NZngJF6nm/

Revision history for this message
Simon Déziel (sdeziel) wrote :

@georgiag, the behaviour changes when you tell tcpdump to do line buffering (`-l`).

Revision history for this message
Georgia Garcia (georgiag) wrote :

Thanks, Simon, I must have missed it.
When I use --mode=non-interactive on lxc and -l on tcpdump, I don't see the issue at all.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.