Merge lp:~jamesodhunt/ubuntu/saucy/sysvinit/log-processes-and-open-files-on-shutdown into lp:ubuntu/saucy/sysvinit

Proposed by James Hunt
Status: Rejected
Rejected by: Iain Lane
Proposed branch: lp:~jamesodhunt/ubuntu/saucy/sysvinit/log-processes-and-open-files-on-shutdown
Merge into: lp:ubuntu/saucy/sysvinit
Diff against target: 94 lines (+51/-0) (has conflicts)
3 files modified
debian/changelog (+11/-0)
debian/src/initscripts/etc/init.d/umountfs (+20/-0)
debian/src/initscripts/etc/init.d/umountroot (+20/-0)
Text conflict in debian/changelog
To merge this branch: bzr merge lp:~jamesodhunt/ubuntu/saucy/sysvinit/log-processes-and-open-files-on-shutdown
Reviewer Review Type Date Requested Status
Steve Langasek Needs Fixing
Ubuntu branches Pending
Review via email: mp+163740@code.launchpad.net

Description of the change

= Problem =

Occasionally, people report issues where the root filesystem cannot be unmounted cleanly on shutdown. To debug these issues generally involves either producing custom PPA packages, or requesting users hack around with the existing shutdown sequence (not ideal).

= Overview =

Since these shutdown issues re-occur from time-to-time (*) it makes sense to put in place a facility to aid debugging without having to produce custom PPA package builds, etc.

This MP logs running processes and open files just prior to unmounting the disks such that if problems are observed on shutdown, they can be investigated on the next boot by looking at the files below which are generated on shutdown:

- /var/log/shutdown-ps.log
- /var/log/shutdown-lsof.log

= Notes =

The lsof command can take some time to run on a normal system. However, note that this should not be a problem at shutdown in the usual case since we don't expect many processes to still be running (and hence it will only slow down the shutdown slightly in the cases where something _has_ gone wrong and hence we need to run lsof to diagnose). Note too that by using the '-lnP' options, lsof is run as fast as possible.

If accepted, this should be forwarded to Debian too.

(*) - this does not imply a regression: it may be that some combination of package versions from the full archive interact in unexpected ways in a particular release (it is not possible to test every combination).

To post a comment you must log in.
Revision history for this message
James Hunt (jamesodhunt) wrote :

Note that once this feature is available, apport hooks can make use of these files for reporting shutdown issues.

Revision history for this message
Steve Langasek (vorlon) wrote :

Thanks for taking this on, James.

I would like to see a few improvements here:
 - These logs are written unconditionally. I would recommend that we write them *only* in the error case, by doing some analysis of the open files first and logging only if we find an issue with files open for writing.
 - Fixing this to only write the logs in case of error means that the sysvinit package itself can be made to trigger apport in the case of a failure, instead of relying on this being picked up by an apport hook later. I would like to see this done. (This does raise the bar for correctness of our checks... the last time something like this was enabled in sendsigs, the false positive rate was terrible and it had to be disabled again. But we should be able to get this done accurately if we're properly checking only for files open for writing and files open on non-root filesystems.)
 - I think logging the process list is redundant. We are going to need the lsof logs for analysis, and if lsof is unavailable, I don't think the ps output is going to be sufficient for debugging: it may let us point the finger at a particular process, but won't tell us anything about why.
 - This output is being written to /var/log in the umountfs script. It's possible (though unlikely) for a file to be opened for writing *after* umountfs has run, but before umountroot is run. I think we want to capture this data at that later point, which is more definitive? Also, since /var/log may not be on the root filesystem, we can't rely on writing there from umountroot. This suggests to me that for best results, we either need to designate some place guaranteed to be on the root filesystem to write a log from umountroot (either directly in / analagous to the /forcefsck flag, or under /etc), or log this information to the console rather than the filesystem. The latter would be /potentially/ useful in the case of a read-only root filesystem, except in that case umountroot can never fail so we don't need to log anything anyway. Oh, and /usr may *also* not be on the root filesystem, which makes it problematic to call /usr/bin/lsof. Perhaps the best approach is to log any problems to /var/log from umountfs, and then again log any problems to the console from umountroot if possible.

Finally, the exact patch here is not very idiomatic:

+ # Log open files as quickly as possible
+ { `command -v lsof` && lsof -lnP > /var/log/shutdown-lsof.log && sync; } || :

This script is not 'set -e', so the { ;} || : is unnecessary and inconsistent with the style of the rest of the script. `command -v lsof` is actually buggy, because this takes the output of 'command -v' and *runs* it, resulting in a duplicate call to lsof (without the speed optimizations). And chaining with '&&' makes the flow less obvious than it could be.

I recommend writing this as:

        if command -v lsof >/dev/null 2>&1; then
            lsof -lnP > /var/log/shutdown-lsof.log
            sync
        fi

review: Needs Fixing
Revision history for this message
James Hunt (jamesodhunt) wrote :

Hi Steve,

Thanks for reviewing. I've made the suggested changes. In the umountroot case, I wonder if we should check the permissions on the root mountpoint and if they are rw also log to '/.shutdown-lsof.log'?

Revision history for this message
Colin Watson (cjwatson) wrote :

Just one drive-by nit: in new code we should use the POSIX form "head -n1" rather than the old-style "head -1".

Revision history for this message
James Hunt (jamesodhunt) wrote :
Revision history for this message
Iain Lane (laney) wrote :

Per the last comment, → rejected.

Unmerged revisions

191. By James Hunt

* debian/src/initscripts/etc/init.d/umountroot: Fix typo and use POSIX
  form of head(1) (thanks cjwatson).

190. By James Hunt

* debian/src/initscripts/etc/init.d/umountfs: Log open files on root
  filesystem just prior to unmounting filesystems to aid diagnosis of
  shutdown issues.
* debian/src/initscripts/etc/init.d/umountroot: Check again for open files
  and log to console.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'debian/changelog'
2--- debian/changelog 2013-06-04 08:53:47 +0000
3+++ debian/changelog 2013-09-27 07:40:31 +0000
4@@ -1,3 +1,4 @@
5+<<<<<<< TREE
6 sysvinit (2.88dsf-41ubuntu3) saucy; urgency=low
7
8 * Exit the ondemand script silently if we are on a system that has
9@@ -18,6 +19,16 @@
10
11 -- Steve Langasek <steve.langasek@ubuntu.com> Tue, 28 May 2013 12:23:43 -0700
12
13+=======
14+sysvinit (2.88dsf-41ubuntu2) saucy; urgency=low
15+
16+ * debian/src/initscripts/etc/init.d/umountfs: Log open files on root
17+ filesystem just prior to unmounting filesystems to aid diagnosis of
18+ shutdown issues.
19+ * debian/src/initscripts/etc/init.d/umountroot: Check again for open files
20+ and log to console.
21+
22+>>>>>>> MERGE-SOURCE
23 sysvinit (2.88dsf-41ubuntu1) saucy; urgency=low
24
25 * Merge from Debian unstable:
26
27=== modified file 'debian/src/initscripts/etc/init.d/umountfs'
28--- debian/src/initscripts/etc/init.d/umountfs 2013-05-17 06:03:10 +0000
29+++ debian/src/initscripts/etc/init.d/umountfs 2013-09-27 07:40:31 +0000
30@@ -16,6 +16,25 @@
31
32 umask 022
33
34+# Before unmounting filesystems, log all files open for writing on the root
35+# filesystem for possible analysis on the next boot since such files
36+# will cause an unclean shutdown.
37+do_log () {
38+ if command -v lsof >/dev/null 2>&1; then
39+ # capture list of open files on root mountpoint as quickly as possible
40+ output=$(lsof -lnPR /)
41+
42+ # look for first occurence of file open for writing
43+ open_files=$(echo "$output"|awk '$5 ~ /^[0-9][0-9]*[uw]/ {print;exit 0}')
44+
45+ if [ -n "$open_files" ]; then
46+ # log full lsof output
47+ echo "$output" > /var/log/shutdown-lsof.log
48+ sync
49+ fi
50+ fi
51+}
52+
53 do_stop () {
54 PROTECTED_MOUNTS="$(sed -n ':a;/^[^ ]* \/ /!{H;n;ba};{H;s/.*//;x;s/\n//;p}' /proc/mounts)"
55 WEAK_MTPTS="" # be gentle, don't use force
56@@ -119,6 +138,7 @@
57 exit 3
58 ;;
59 stop)
60+ do_log
61 do_stop
62 ;;
63 *)
64
65=== modified file 'debian/src/initscripts/etc/init.d/umountroot'
66--- debian/src/initscripts/etc/init.d/umountroot 2013-05-17 06:03:10 +0000
67+++ debian/src/initscripts/etc/init.d/umountroot 2013-09-27 07:40:31 +0000
68@@ -14,6 +14,26 @@
69
70 . /lib/lsb/init-functions
71
72+# Before unmounting root, log all files open for writing to the console
73+# (since no other locations are available for writing and root may be
74+# read-only).
75+do_log () {
76+ if command -v lsof >/dev/null 2>&1; then
77+ # capture list of open files on root mountpoint as quickly as possible
78+ output=$(lsof -lnPR /)
79+ open_files=$(echo "$output"|awk '$5 ~ /^[0-9][0-9]*[uw]/ {print}')
80+
81+ if [ -n "$open_files" ]; then
82+ echo "INFO: files open at shutdown"
83+ # show lsof header
84+ echo "$output"|head -n1
85+ # just show the open files to avoid completely
86+ # flooding the console.
87+ echo "$open_files" > /dev/console
88+ fi
89+ fi
90+}
91+
92 do_stop () {
93 [ "$VERBOSE" = no ] || log_action_begin_msg "Mounting root filesystem read-only"
94 # Ask init to re-exec itself before we go down if it has been

Subscribers

People subscribed via source and target branches

to all changes: