Merge lp:~stgraber/apport/pidns-support into lp:~apport-hackers/apport/trunk

Proposed by Stéphane Graber
Status: Merged
Merged at revision: 2747
Proposed branch: lp:~stgraber/apport/pidns-support
Merge into: lp:~apport-hackers/apport/trunk
Diff against target: 58 lines (+20/-4)
2 files modified
data/apport (+19/-3)
etc/init.d/apport (+1/-1)
To merge this branch: bzr merge lp:~stgraber/apport/pidns-support
Reviewer Review Type Date Requested Status
Martin Pitt (community) Approve
Review via email: mp+200893@code.launchpad.net

Description of the change

This adds support for PID namespaces in apport.
With that change, crashes originating from a container on a system running a >= 3.12 kernel will be automatically redirected to apport inside the container.

This is done through a new wrapper called apport-pidns-wrapper and a new feature I introduced in core_pattern upstream (%P).

This script will be called with an extended core pattern which includes %P (PID in the host namespace), if that one doesn't match with %p (PID in the calling namespace), the script will look for /proc/<pid in host ns>/root/usr/share/apport/apport and if it exists, call it through chroot.

On pre-3.12 systems, all crashes will simply be forwarded to apport on the host (as used to be the case).
If apport isn't installed in the calling namespace, the wrapper will simply exit 0 (instead of the current behaviour of either spamming apport.log with useless content and stacktraces or worse, getting apport to think a perfectly sane process just crashed).

On top of that change, the change below will be needed in the distribution packaging:
=== modified file 'debian/apport.install'
--- debian/apport.install 2013-08-19 14:25:01 +0000
+++ debian/apport.install 2014-01-08 19:20:40 +0000
@@ -1,6 +1,7 @@
 etc
 usr/share/apport/apport
 usr/share/apport/apport-checkreports
+usr/share/apport/apport-pidns-wrapper
 usr/share/apport/package_hook
 usr/share/apport/kernel_crashdump
 usr/share/apport/kernel_oops

=== modified file 'debian/apport.upstart'
--- debian/apport.upstart 2013-07-08 14:50:45 +0000
+++ debian/apport.upstart 2014-01-08 19:22:37 +0000
@@ -30,7 +30,7 @@
         rm -f /var/lib/pm-utils/resume-hang.log
     fi

- echo "|/usr/share/apport/apport %p %s %c" > /proc/sys/kernel/core_pattern
+ echo "|/usr/share/apport/apport-pidns-wrapper %p %s %c %P" > /proc/sys/kernel/core_pattern
     echo 2 > /proc/sys/fs/suid_dumpable
 end script

Let me know if you have any question.

To post a comment you must log in.
Revision history for this message
Martin Pitt (pitti) wrote :

Thanks Stéphane! The general logic/approach looks fine to me, it's nice to get this working.

To be honest I'm not too thrilled about introducing a completely new shell wrapper for this. In the "normal" case of a non-containerized desktop system it adds further overhead to a crash (the extra shell call), and the script is very hard to test automatically, even for basic things like syntax errors. Also, it hardcodes the installed path of apport, which doesn't respect the install/prefix options of setup.py and makes it hard to test changes to data/apport in the source tree.

Could we put the "%p != %P → call apport in the container" logic into data/apport itself? With os.execl() this is not any more difficult than in shell, it would make this more accessible to integrating with the existing tests, and completely avoids the hardcoded path.

This would also potentially allow us to go a step further and instead of os.execl() just do os.chroot(), so that the host's "apport" script is used in the containers. data/apport doesn't change that often really, and the .crash format has never been broken in the entire apport history (if it does, we'd need to drop the chroot()ing, and go back to exec'ing). What do you think about that?

If all that is impractical, I'd like the new shell wrapper to have "set -euC", and the hardcoded path be replaced with a macro that setup.py install turns into the destination path.

Thanks!

review: Needs Fixing
lp:~stgraber/apport/pidns-support updated
2743. By Stéphane Graber

Add support for PID namespaces (a.k.a. apport in LXC containers).

Revision history for this message
Stéphane Graber (stgraber) wrote :

Branch updated.

There are two obvious potential problems:
 1) Apport in the container isn't in /usr/share/apport/apport. There's unfortunately not much we can do but maybe hardcode a few known locations (if we know other distros tend to ship it in some other place).
 2) The chroot command isn't in /usr/sbin/chroot (it very well could be /sbin/chroot). I tried using execvp to solve that one, however this didn't work, probably because PATH isn't set or is empty.

Those two won't be a problem now but may be worth keeping in mind and addressing eventually.

Revision history for this message
Martin Pitt (pitti) wrote :

> 1) Apport in the container isn't in /usr/share/apport/apport. There's unfortunately not much we can do but maybe hardcode a few known locations (if we know other distros tend to ship it in some other place).

Nevermind, it's hardcoded in the upstream init.d script as well. But your __file__ solution sounds fine, so as long as you use the same distro as host and guest it should be fine. They have to have the same path anyway, as the whole system only has one core_pattern pipeline.

> 2) The chroot command isn't in /usr/sbin/chroot (it very well could be /sbin/chroot). I tried using execvp to solve that one, however this didn't work, probably because PATH isn't set or is empty.

That's correct, the kernel invokes the core dump pipeline without any environment. If that becomes a problem, data/apport could os.environ['PATH'] to a sensible value itself.

This looks a lot nicer, thank you! I'll add an error_log() when successfully forwarding to a container as well, at least for the initial deployment (to make debugging easier).

review: Approve
Revision history for this message
Martin Pitt (pitti) wrote :

Merged with some minor cleanups in http://bazaar.launchpad.net/~apport-hackers/apport/trunk/revision/2747.

I tested this with a saucy LXC container, and it works very well, nice work! The main pitfall that I noticed is that as soon as you start/stop a container, or install/remove apport in the container, that is going to change the global /proc/sys/kernel/core_pattern. I. e. if you run a container from a previous Ubuntu release, that's going to drop the %P again, or completely shut off apport globally if you shut down the container (due to "stop apport"). To solve that, we would need to fix the init script/upstart job to not change core_pattern when it's running in a container, and SRU that change to at least precise. How can this "running in a container" detection be done?

Thanks!

Revision history for this message
Martin Pitt (pitti) wrote :

> How can this "running in a container" detection be done?

Is [ -e /run/container_type ] adequate for this?

Revision history for this message
Martin Pitt (pitti) wrote :

Ah, I found this in qemu:

        grep -zqs ^container= /proc/1/environ && exit 0

/bin/running-in-container anad the "container" upstart signal are upstart specific, but the above sounds just fine.

I reported bug 1267728 about that (for SRU purposes), and will commit a fix for this now.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'data/apport'
--- data/apport 2013-10-25 04:17:18 +0000
+++ data/apport 2014-01-09 18:39:42 +0000
@@ -248,9 +248,9 @@
248#248#
249#################################################################249#################################################################
250250
251if len(sys.argv) != 4:251if len(sys.argv) not in (4, 5):
252 try:252 try:
253 print('Usage: %s <pid> <signal number> <core file ulimit>' % sys.argv[0])253 print('Usage: %s <pid> <signal number> <core file ulimit> [global pid]' % sys.argv[0])
254 print('The core dump is read from stdin.')254 print('The core dump is read from stdin.')
255 except IOError:255 except IOError:
256 # sys.stderr might not actually exist, expecially not when being called256 # sys.stderr might not actually exist, expecially not when being called
@@ -259,12 +259,28 @@
259 sys.exit(1)259 sys.exit(1)
260260
261init_error_log()261init_error_log()
262
263# Check if we received a valid global PID (kernel >= 3.12). If we do,
264# then compare it with the local PID. If they don't match, it's an
265# indication that the crash originated from another PID namespace. In that
266# case, attempt to forward the crash to apport in that namespace. If
267# apport can't be found, then simply log an entry in the host error log
268# and exit 0.
269if len(sys.argv) == 5 and sys.argv[4].isdigit() and sys.argv[4] != sys.argv[1]:
270 if os.path.exists("/proc/%s/root/%s" % (sys.argv[4], __file__)):
271 os.execv("/usr/sbin/chroot", ("chroot", "/proc/%s/root/" % sys.argv[4],
272 __file__, sys.argv[1], sys.argv[2],
273 sys.argv[3]))
274 else:
275 error_log('pid %s crashed in a container without apport support' % sys.argv[4])
276 sys.exit(0)
277
262check_lock()278check_lock()
263279
264try:280try:
265 setup_signals()281 setup_signals()
266282
267 (pid, signum, core_ulimit) = sys.argv[1:]283 (pid, signum, core_ulimit) = sys.argv[1:4]
268284
269 # drop our process priority level to not disturb userspace so much285 # drop our process priority level to not disturb userspace so much
270 try:286 try:
271287
=== modified file 'etc/init.d/apport'
--- etc/init.d/apport 2013-02-27 14:50:16 +0000
+++ etc/init.d/apport 2014-01-09 18:39:42 +0000
@@ -52,7 +52,7 @@
52 rm -f /var/lib/pm-utils/resume-hang.log52 rm -f /var/lib/pm-utils/resume-hang.log
53 fi53 fi
5454
55 echo "|$AGENT %p %s %c" > /proc/sys/kernel/core_pattern55 echo "|$AGENT %p %s %c %P" > /proc/sys/kernel/core_pattern
56}56}
5757
58#58#

Subscribers

People subscribed via source and target branches