Merge ~ubuntu-release/autopkgtest/+git/development:kill-harder into ~ubuntu-release/autopkgtest/+git/development:master

Proposed by Brian Murray
Status: Merged
Merge reported by: Brian Murray
Merged at revision: 396dc57a5b0def6b4a822c678413cf666524c98b
Proposed branch: ~ubuntu-release/autopkgtest/+git/development:kill-harder
Merge into: ~ubuntu-release/autopkgtest/+git/development:master
Diff against target: 30 lines (+18/-0)
1 file modified
lib/adt_testbed.py (+18/-0)
Reviewer Review Type Date Requested Status
Paride Legovini (community) Approve
Julian Andres Klode (community) Needs Fixing
Ubuntu Release Team Pending
Review via email: mp+423316@code.launchpad.net

Commit message

kill processes with SIGKILL if SIGTERM fails

When trying to kill long running test processes on armhf they don't always get killed so if the process is still running kill it again with SIGKILL.

Description of the change

Tests on armhf were continuing to run after a timeout was reached. I added some debugging to the autopkgtest code on the lxd worker and noticed that that SIGTERM call wasn't actually killing the process. This change uses SIGKILL if SIGTERM doesn't work out.

I cowboy'ed this in production and we can now see that the apport tests on armhf are now getting killed.

https://autopkgtest.ubuntu.com/results/autopkgtest-impish/impish/armhf/a/apport/20220524_020151_24815@/log.gz

To post a comment you must log in.
dbdb476... by Brian Murray

remove redundant return

Revision history for this message
Julian Andres Klode (juliank) wrote :

There should be some delay between those checks, as it might take a couple of seconds to complete a SIGTERM.

Alternatively, if we deem it safe to kill without waiting, we don't need to send SIGTERM in the first place and can directly send SIGKILL.

review: Needs Fixing
a35c60c... by Brian Murray

Give time for SIGTERM to work.

Revision history for this message
Julian Andres Klode (juliank) wrote :

Should we check /proc/PID/cmdline before term and before kill to make sure the PID did not get reassigned?

Certainly you do not want to sleep after the check, as that increases the chance of a race.

Races might be possible if the testbed fork bombs.

Revision history for this message
Iain Lane (laney) wrote :

How come Popen.kill() can't be used?

Anyway, I'd suggest creating a pidfd using os.pidfd_open(), storing that, and then using signal.pidfd_send_signal (python 3.9+) to avoid races like the ones described. That avoids all PID recycling errors.

Also, do forward upstream! :-)

Revision history for this message
Brian Murray (brian-murray) wrote :

> How come Popen.kill() can't be used?
>
> Anyway, I'd suggest creating a pidfd using os.pidfd_open(), storing that, and
> then using signal.pidfd_send_signal (python 3.9+) to avoid races like the ones
> described. That avoids all PID recycling errors.

The autopkgtest environment is still running on Focal which has python3.8 in main and python3.9 in universe and python3.9 is not installed. So while your idea sounds best I wonder what issues we'd run into with python3.9.

> Also, do forward upstream! :-)

da06549... by Brian Murray

Implement changes based on reviewer feedback.

Revision history for this message
Paride Legovini (paride) wrote :

A couple of inline comments, but this basically LGTM.

review: Needs Fixing
ed50ab5... by Brian Murray

Check more frequently for SIGTERM to work and use adtlog.info()

0789950... by Brian Murray

fix math

Revision history for this message
Paride Legovini (paride) wrote :

LGTM (one nit).

review: Approve
396dc57... by Brian Murray

Fix a nit regarding the variable name

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/lib/adt_testbed.py b/lib/adt_testbed.py
2index c0cbaad..d2991eb 100644
3--- a/lib/adt_testbed.py
4+++ b/lib/adt_testbed.py
5@@ -1551,7 +1551,25 @@ def killtree(pid):
6
7 for child in child_ps(pid):
8 killtree(child)
9+ with open('/proc/%s/cmdline' % pid, 'rb') as fd:
10+ cmd = fd.read()
11 try:
12 os.kill(pid, signal.SIGTERM)
13 except OSError:
14 pass
15+ for _ in range(21):
16+ if not os.path.exists('/proc/%s' % pid):
17+ return
18+ else:
19+ time.sleep(0.2)
20+ try:
21+ with open('/proc/%s/cmdline' % pid, 'rb') as fd:
22+ if cmd != fd.read():
23+ return
24+ except FileNotFoundError:
25+ return
26+ try:
27+ adtlog.info('kill with SIGTERM did not work sending SIGKILL')
28+ os.kill(pid, signal.SIGKILL)
29+ except OSError:
30+ pass

Subscribers

People subscribed via source and target branches