Apport

Merge lp:~stgraber/apport/pidns-support into lp:~apport-hackers/apport/trunk

pidns-support
Merge into trunk

Proposed by Stéphane Graber on 2014-01-08

Status:	Merged
Merged at revision:	2747
Proposed branch:	lp:~stgraber/apport/pidns-support
Merge into:	lp:~apport-hackers/apport/trunk
Diff against target:	58 lines (+20/-4) 2 files modified data/apport (+19/-3) etc/init.d/apport (+1/-1)
To merge this branch:	bzr merge lp:~stgraber/apport/pidns-support
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Martin Pitt (community)		2014-01-08	Approve on 2014-01-10
Review via email: mp+200893@code.launchpad.net

Description of the change

This adds support for PID namespaces in apport.
With that change, crashes originating from a container on a system running a >= 3.12 kernel will be automatically redirected to apport inside the container.

This is done through a new wrapper called apport-pidns-wrapper and a new feature I introduced in core_pattern upstream (%P).

This script will be called with an extended core pattern which includes %P (PID in the host namespace), if that one doesn't match with %p (PID in the calling namespace), the script will look for /proc/<pid in host ns>/root/usr/share/apport/apport and if it exists, call it through chroot.

On pre-3.12 systems, all crashes will simply be forwarded to apport on the host (as used to be the case).
If apport isn't installed in the calling namespace, the wrapper will simply exit 0 (instead of the current behaviour of either spamming apport.log with useless content and stacktraces or worse, getting apport to think a perfectly sane process just crashed).

On top of that change, the change below will be needed in the distribution packaging:
=== modified file 'debian/apport.install'
--- debian/apport.install 2013-08-19 14:25:01 +0000
+++ debian/apport.install 2014-01-08 19:20:40 +0000
@@ -1,6 +1,7 @@
etc
usr/share/apport/apport
usr/share/apport/apport-checkreports
+usr/share/apport/apport-pidns-wrapper
usr/share/apport/package_hook
usr/share/apport/kernel_crashdump
usr/share/apport/kernel_oops

=== modified file 'debian/apport.upstart'
--- debian/apport.upstart 2013-07-08 14:50:45 +0000
+++ debian/apport.upstart 2014-01-08 19:22:37 +0000
@@ -30,7 +30,7 @@
rm -f /var/lib/pm-utils/resume-hang.log
fi

- echo "|/usr/share/apport/apport %p %s %c" > /proc/sys/kernel/core_pattern
+ echo "|/usr/share/apport/apport-pidns-wrapper %p %s %c %P" > /proc/sys/kernel/core_pattern
echo 2 > /proc/sys/fs/suid_dumpable
end script

Let me know if you have any question.

Revision history for this message

Martin Pitt (pitti) wrote on 2014-01-09:

Thanks Stéphane! The general logic/approach looks fine to me, it's nice to get this working.

To be honest I'm not too thrilled about introducing a completely new shell wrapper for this. In the "normal" case of a non-containerized desktop system it adds further overhead to a crash (the extra shell call), and the script is very hard to test automatically, even for basic things like syntax errors. Also, it hardcodes the installed path of apport, which doesn't respect the install/prefix options of setup.py and makes it hard to test changes to data/apport in the source tree.

Could we put the "%p != %P → call apport in the container" logic into data/apport itself? With os.execl() this is not any more difficult than in shell, it would make this more accessible to integrating with the existing tests, and completely avoids the hardcoded path.

This would also potentially allow us to go a step further and instead of os.execl() just do os.chroot(), so that the host's "apport" script is used in the containers. data/apport doesn't change that often really, and the .crash format has never been broken in the entire apport history (if it does, we'd need to drop the chroot()ing, and go back to exec'ing). What do you think about that?

If all that is impractical, I'd like the new shell wrapper to have "set -euC", and the hardcoded path be replaced with a macro that setup.py install turns into the destination path.

Thanks!

review: Needs Fixing

lp:~stgraber/apport/pidns-support updated on 2014-01-09

2743. By Stéphane Graber on 2014-01-09: Add support for PID namespaces (a.k.a. apport in LXC containers).

Revision history for this message

Stéphane Graber (stgraber) wrote on 2014-01-09:

Branch updated.

There are two obvious potential problems:
1) Apport in the container isn't in /usr/share/apport/apport. There's unfortunately not much we can do but maybe hardcode a few known locations (if we know other distros tend to ship it in some other place).
2) The chroot command isn't in /usr/sbin/chroot (it very well could be /sbin/chroot). I tried using execvp to solve that one, however this didn't work, probably because PATH isn't set or is empty.

Those two won't be a problem now but may be worth keeping in mind and addressing eventually.

Revision history for this message

Martin Pitt (pitti) wrote on 2014-01-10:

> 1) Apport in the container isn't in /usr/share/apport/apport. There's unfortunately not much we can do but maybe hardcode a few known locations (if we know other distros tend to ship it in some other place).

Nevermind, it's hardcoded in the upstream init.d script as well. But your __file__ solution sounds fine, so as long as you use the same distro as host and guest it should be fine. They have to have the same path anyway, as the whole system only has one core_pattern pipeline.

> 2) The chroot command isn't in /usr/sbin/chroot (it very well could be /sbin/chroot). I tried using execvp to solve that one, however this didn't work, probably because PATH isn't set or is empty.

That's correct, the kernel invokes the core dump pipeline without any environment. If that becomes a problem, data/apport could os.environ['PATH'] to a sensible value itself.

This looks a lot nicer, thank you! I'll add an error_log() when successfully forwarding to a container as well, at least for the initial deployment (to make debugging easier).

review: Approve

Revision history for this message

Martin Pitt (pitti) wrote on 2014-01-10:

Merged with some minor cleanups in http://bazaar.launchpad.net/~apport-hackers/apport/trunk/revision/2747.

I tested this with a saucy LXC container, and it works very well, nice work! The main pitfall that I noticed is that as soon as you start/stop a container, or install/remove apport in the container, that is going to change the global /proc/sys/kernel/core_pattern. I. e. if you run a container from a previous Ubuntu release, that's going to drop the %P again, or completely shut off apport globally if you shut down the container (due to "stop apport"). To solve that, we would need to fix the init script/upstart job to not change core_pattern when it's running in a container, and SRU that change to at least precise. How can this "running in a container" detection be done?

Thanks!

Revision history for this message

Martin Pitt (pitti) wrote on 2014-01-10:

> How can this "running in a container" detection be done?

Is [ -e /run/container_type ] adequate for this?

Revision history for this message

Martin Pitt (pitti) wrote on 2014-01-10:

Ah, I found this in qemu:

grep -zqs ^container= /proc/1/environ && exit 0

/bin/running-in-container anad the "container" upstart signal are upstart specific, but the above sounds just fine.

I reported bug 1267728 about that (for SRU purposes), and will commit a fix for this now.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Brian Murray

Bruno Maximilian Voss

Martin Pitt

Ritesh Raj Sarraf

Stéphane Graber

 === modified file 'data/apport'
 --- data/apport	2013-10-25 04:17:18 +0000
 +++ data/apport	2014-01-09 18:39:42 +0000
@@ -248,9 +248,9 @@
+ #
  #################################################################
--if len(sys.argv) != 4:
++if len(sys.argv) not in (4, 5):
      try:
--        print('Usage: %s <pid> <signal number> <core file ulimit>' % sys.argv[0])
++        print('Usage: %s <pid> <signal number> <core file ulimit> [global pid]' % sys.argv[0])
          print('The core dump is read from stdin.')
      except IOError:
          # sys.stderr might not actually exist, expecially not when being called
@@ -259,12 +259,28 @@
      sys.exit(1)
  init_error_log()
++
++# Check if we received a valid global PID (kernel >= 3.12). If we do,
++# then compare it with the local PID. If they don't match, it's an
++# indication that the crash originated from another PID namespace. In that
++# case, attempt to forward the crash to apport in that namespace. If
++# apport can't be found, then simply log an entry in the host error log
++# and exit 0.
++if len(sys.argv) == 5 and sys.argv[4].isdigit() and sys.argv[4] != sys.argv[1]:
++    if os.path.exists("/proc/%s/root/%s" % (sys.argv[4], __file__)):
++        os.execv("/usr/sbin/chroot", ("chroot", "/proc/%s/root/" % sys.argv[4],
++                                      __file__, sys.argv[1], sys.argv[2],
++                                      sys.argv[3]))
++    else:
++        error_log('pid %s crashed in a container without apport support' % sys.argv[4])
++        sys.exit(0)
++
  check_lock()
  try:
      setup_signals()
--    (pid, signum, core_ulimit) = sys.argv[1:]
++    (pid, signum, core_ulimit) = sys.argv[1:4]
      # drop our process priority level to not disturb userspace so much
      try:
 === modified file 'etc/init.d/apport'
 --- etc/init.d/apport	2013-02-27 14:50:16 +0000
 +++ etc/init.d/apport	2014-01-09 18:39:42 +0000
@@ -52,7 +52,7 @@
  		rm -f /var/lib/pm-utils/resume-hang.log
  	fi
--        echo "|$AGENT %p %s %c" > /proc/sys/kernel/core_pattern
++        echo "|$AGENT %p %s %c %P" > /proc/sys/kernel/core_pattern
+ }
+ #

Apport

Merge lp:~stgraber/apport/pidns-support into lp:~apport-hackers/apport/trunk

Commit message

Description of the change

Preview Diff

Subscribers