Merge lp:~stgraber/apport/pidns-support into lp:~apport-hackers/apport/trunk

Proposed by Stéphane Graber
Status: Merged
Merged at revision: 2747
Proposed branch: lp:~stgraber/apport/pidns-support
Merge into: lp:~apport-hackers/apport/trunk
Diff against target: 58 lines (+20/-4)
2 files modified
data/apport (+19/-3)
etc/init.d/apport (+1/-1)
To merge this branch: bzr merge lp:~stgraber/apport/pidns-support
Reviewer Review Type Date Requested Status
Martin Pitt (community) Approve
Review via email: mp+200893@code.launchpad.net

Description of the change

This adds support for PID namespaces in apport.
With that change, crashes originating from a container on a system running a >= 3.12 kernel will be automatically redirected to apport inside the container.

This is done through a new wrapper called apport-pidns-wrapper and a new feature I introduced in core_pattern upstream (%P).

This script will be called with an extended core pattern which includes %P (PID in the host namespace), if that one doesn't match with %p (PID in the calling namespace), the script will look for /proc/<pid in host ns>/root/usr/share/apport/apport and if it exists, call it through chroot.

On pre-3.12 systems, all crashes will simply be forwarded to apport on the host (as used to be the case).
If apport isn't installed in the calling namespace, the wrapper will simply exit 0 (instead of the current behaviour of either spamming apport.log with useless content and stacktraces or worse, getting apport to think a perfectly sane process just crashed).

On top of that change, the change below will be needed in the distribution packaging:
=== modified file 'debian/apport.install'
--- debian/apport.install 2013-08-19 14:25:01 +0000
+++ debian/apport.install 2014-01-08 19:20:40 +0000
@@ -1,6 +1,7 @@
 etc
 usr/share/apport/apport
 usr/share/apport/apport-checkreports
+usr/share/apport/apport-pidns-wrapper
 usr/share/apport/package_hook
 usr/share/apport/kernel_crashdump
 usr/share/apport/kernel_oops

=== modified file 'debian/apport.upstart'
--- debian/apport.upstart 2013-07-08 14:50:45 +0000
+++ debian/apport.upstart 2014-01-08 19:22:37 +0000
@@ -30,7 +30,7 @@
         rm -f /var/lib/pm-utils/resume-hang.log
     fi

- echo "|/usr/share/apport/apport %p %s %c" > /proc/sys/kernel/core_pattern
+ echo "|/usr/share/apport/apport-pidns-wrapper %p %s %c %P" > /proc/sys/kernel/core_pattern
     echo 2 > /proc/sys/fs/suid_dumpable
 end script

Let me know if you have any question.

To post a comment you must log in.
Revision history for this message
Martin Pitt (pitti) wrote :

Thanks Stéphane! The general logic/approach looks fine to me, it's nice to get this working.

To be honest I'm not too thrilled about introducing a completely new shell wrapper for this. In the "normal" case of a non-containerized desktop system it adds further overhead to a crash (the extra shell call), and the script is very hard to test automatically, even for basic things like syntax errors. Also, it hardcodes the installed path of apport, which doesn't respect the install/prefix options of setup.py and makes it hard to test changes to data/apport in the source tree.

Could we put the "%p != %P → call apport in the container" logic into data/apport itself? With os.execl() this is not any more difficult than in shell, it would make this more accessible to integrating with the existing tests, and completely avoids the hardcoded path.

This would also potentially allow us to go a step further and instead of os.execl() just do os.chroot(), so that the host's "apport" script is used in the containers. data/apport doesn't change that often really, and the .crash format has never been broken in the entire apport history (if it does, we'd need to drop the chroot()ing, and go back to exec'ing). What do you think about that?

If all that is impractical, I'd like the new shell wrapper to have "set -euC", and the hardcoded path be replaced with a macro that setup.py install turns into the destination path.

Thanks!

review: Needs Fixing
lp:~stgraber/apport/pidns-support updated
2743. By Stéphane Graber

Add support for PID namespaces (a.k.a. apport in LXC containers).

Revision history for this message
Stéphane Graber (stgraber) wrote :

Branch updated.

There are two obvious potential problems:
 1) Apport in the container isn't in /usr/share/apport/apport. There's unfortunately not much we can do but maybe hardcode a few known locations (if we know other distros tend to ship it in some other place).
 2) The chroot command isn't in /usr/sbin/chroot (it very well could be /sbin/chroot). I tried using execvp to solve that one, however this didn't work, probably because PATH isn't set or is empty.

Those two won't be a problem now but may be worth keeping in mind and addressing eventually.

Revision history for this message
Martin Pitt (pitti) wrote :

> 1) Apport in the container isn't in /usr/share/apport/apport. There's unfortunately not much we can do but maybe hardcode a few known locations (if we know other distros tend to ship it in some other place).

Nevermind, it's hardcoded in the upstream init.d script as well. But your __file__ solution sounds fine, so as long as you use the same distro as host and guest it should be fine. They have to have the same path anyway, as the whole system only has one core_pattern pipeline.

> 2) The chroot command isn't in /usr/sbin/chroot (it very well could be /sbin/chroot). I tried using execvp to solve that one, however this didn't work, probably because PATH isn't set or is empty.

That's correct, the kernel invokes the core dump pipeline without any environment. If that becomes a problem, data/apport could os.environ['PATH'] to a sensible value itself.

This looks a lot nicer, thank you! I'll add an error_log() when successfully forwarding to a container as well, at least for the initial deployment (to make debugging easier).

review: Approve
Revision history for this message
Martin Pitt (pitti) wrote :

Merged with some minor cleanups in http://bazaar.launchpad.net/~apport-hackers/apport/trunk/revision/2747.

I tested this with a saucy LXC container, and it works very well, nice work! The main pitfall that I noticed is that as soon as you start/stop a container, or install/remove apport in the container, that is going to change the global /proc/sys/kernel/core_pattern. I. e. if you run a container from a previous Ubuntu release, that's going to drop the %P again, or completely shut off apport globally if you shut down the container (due to "stop apport"). To solve that, we would need to fix the init script/upstart job to not change core_pattern when it's running in a container, and SRU that change to at least precise. How can this "running in a container" detection be done?

Thanks!

Revision history for this message
Martin Pitt (pitti) wrote :

> How can this "running in a container" detection be done?

Is [ -e /run/container_type ] adequate for this?

Revision history for this message
Martin Pitt (pitti) wrote :

Ah, I found this in qemu:

        grep -zqs ^container= /proc/1/environ && exit 0

/bin/running-in-container anad the "container" upstart signal are upstart specific, but the above sounds just fine.

I reported bug 1267728 about that (for SRU purposes), and will commit a fix for this now.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'data/apport'
2--- data/apport 2013-10-25 04:17:18 +0000
3+++ data/apport 2014-01-09 18:39:42 +0000
4@@ -248,9 +248,9 @@
5 #
6 #################################################################
7
8-if len(sys.argv) != 4:
9+if len(sys.argv) not in (4, 5):
10 try:
11- print('Usage: %s <pid> <signal number> <core file ulimit>' % sys.argv[0])
12+ print('Usage: %s <pid> <signal number> <core file ulimit> [global pid]' % sys.argv[0])
13 print('The core dump is read from stdin.')
14 except IOError:
15 # sys.stderr might not actually exist, expecially not when being called
16@@ -259,12 +259,28 @@
17 sys.exit(1)
18
19 init_error_log()
20+
21+# Check if we received a valid global PID (kernel >= 3.12). If we do,
22+# then compare it with the local PID. If they don't match, it's an
23+# indication that the crash originated from another PID namespace. In that
24+# case, attempt to forward the crash to apport in that namespace. If
25+# apport can't be found, then simply log an entry in the host error log
26+# and exit 0.
27+if len(sys.argv) == 5 and sys.argv[4].isdigit() and sys.argv[4] != sys.argv[1]:
28+ if os.path.exists("/proc/%s/root/%s" % (sys.argv[4], __file__)):
29+ os.execv("/usr/sbin/chroot", ("chroot", "/proc/%s/root/" % sys.argv[4],
30+ __file__, sys.argv[1], sys.argv[2],
31+ sys.argv[3]))
32+ else:
33+ error_log('pid %s crashed in a container without apport support' % sys.argv[4])
34+ sys.exit(0)
35+
36 check_lock()
37
38 try:
39 setup_signals()
40
41- (pid, signum, core_ulimit) = sys.argv[1:]
42+ (pid, signum, core_ulimit) = sys.argv[1:4]
43
44 # drop our process priority level to not disturb userspace so much
45 try:
46
47=== modified file 'etc/init.d/apport'
48--- etc/init.d/apport 2013-02-27 14:50:16 +0000
49+++ etc/init.d/apport 2014-01-09 18:39:42 +0000
50@@ -52,7 +52,7 @@
51 rm -f /var/lib/pm-utils/resume-hang.log
52 fi
53
54- echo "|$AGENT %p %s %c" > /proc/sys/kernel/core_pattern
55+ echo "|$AGENT %p %s %c %P" > /proc/sys/kernel/core_pattern
56 }
57
58 #

Subscribers

People subscribed via source and target branches