Merge ~andersson123/autopkgtest-cloud:worker-dont-remove-queue-item-systemctl-restart into autopkgtest-cloud:master

Proposed by Tim Andersson
Status: Merged
Merged at revision: dfc6fdd0527d4e7fd9ff477f3c85c76b704c758c
Proposed branch: ~andersson123/autopkgtest-cloud:worker-dont-remove-queue-item-systemctl-restart
Merge into: autopkgtest-cloud:master
Diff against target: 55 lines (+26/-3)
2 files modified
charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/worker/worker (+23/-3)
docs/administration.rst (+3/-0)
Reviewer Review Type Date Requested Status
Paride Legovini Approve
Review via email: mp+465424@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Tim Andersson (andersson123) wrote :

Needs a delineation for systemctl stop and systemctl restart

moved to WIP

Revision history for this message
Tim Andersson (andersson123) :
Revision history for this message
Tim Andersson (andersson123) wrote :

with restart:
May 21 10:46:50 juju-113c58-stg-proposed-migration-environment-94 /home/ubuntu/autopkgtest-cloud/worker/worker[427475]: INFO: Caught SIGTERM, requesting exit
May 21 10:46:51 juju-113c58-stg-proposed-migration-environment-94 /home/ubuntu/autopkgtest-cloud/worker/worker[427475]: WARNING: autopkgtest exited with code: -15
                                                                                                                        ack-ing message: b'hello\n{"triggers": ["hello/2.10-3build1"], "submit-time": "2024-05-21 10:46:27", "uuid": "f23dfb9c-4bd0-4dda-93a6-819ee104545f"}'
May 21 10:46:51 juju-113c58-stg-proposed-migration-environment-94 /home/ubuntu/autopkgtest-cloud/worker/worker[427475]: INFO: autopkgtest has been killed with USR1, indicating a restart, not removing message hello
                                                                                                                        {"triggers": ["hello/2.10-3build1"], "submit-time": "2024-05-21 10:46:27", "uuid": "f23dfb9c-4bd0-4dda-93a6-819ee104545f"} from queue
May 21 10:46:51 juju-113c58-stg-proposed-migration-environment-94 /home/ubuntu/autopkgtest-cloud/worker/worker[427475]: INFO: Killing openstack server with uuid f23dfb9c-4bd0-4dda-93a6-819ee104545f
May 21 10:46:53 juju-113c58-stg-proposed-migration-environment-94 /home/ubuntu/autopkgtest-cloud/worker/worker[427475]: INFO: Deleted test server: adt-noble-amd64-hello-20240521-104627-juju-113c58-stg-proposed-migration-environment-94-f23dfb9c-4bd0-4dda-93a6-819ee104545f

Revision history for this message
Tim Andersson (andersson123) wrote :

with stop:

May 21 10:48:14 juju-113c58-stg-proposed-migration-environment-94 systemd[1]: Stopping autopkgtest worker lcy02-1...
May 21 10:48:14 juju-113c58-stg-proposed-migration-environment-94 systemd[1]: <email address hidden>: Main process exited, code=killed, status=9/KILL

That's not great...

Revision history for this message
Tim Andersson (andersson123) wrote :

Well, actually, once started again, it picks up the same message, so having SIGKILL might be good... Although too brute force actually I think

Revision history for this message
Tim Andersson (andersson123) wrote :

Hm, same with USR1

May 21 10:53:39 juju-113c58-stg-proposed-migration-environment-94 systemd[1]: Stopping autopkgtest worker lcy02-1...
May 21 10:53:39 juju-113c58-stg-proposed-migration-environment-94 systemd[1]: <email address hidden>: Main process exited, code=killed, status=10/USR1

Revision history for this message
Tim Andersson (andersson123) wrote :

Still goes back into queue however.

Revision history for this message
Tim Andersson (andersson123) wrote :

Okay. Ready for review. All tested in staging.

Revision history for this message
Tim Andersson (andersson123) wrote :

Note to reviewer - I know the logic of the block delineating the exit codes could be simplified a bit, but I think the explicit logging will be helpful and it's most readable in its current state.

Revision history for this message
Paride Legovini (paride) :
review: Needs Information
Revision history for this message
Tim Andersson (andersson123) :
Revision history for this message
Paride Legovini (paride) wrote :

One comment, for the rest lgtm.

Revision history for this message
Tim Andersson (andersson123) wrote :

Hey, amended! The only thing different here, apart from removing those systemd unit configs, is that I also changed it so that any edge cases are also requeued. I think this is a good thing to do by default. This is NOT related to permanent looping, however. This is different entirely. If something kills the process we're unaware of, or in an odd way, we don't want to lose the test request.

Ready for re-review!

Revision history for this message
Paride Legovini (paride) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/worker/worker b/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/worker/worker
2index 809f7ad..0173fe5 100755
3--- a/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/worker/worker
4+++ b/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/worker/worker
5@@ -1332,13 +1332,33 @@ def request(msg):
6 # tests can be requested to be killed with:
7 # kill -15 $pid
8 logging.warning(
9- "autopkgtest exited with code: %i\nack-ing message: %s",
10+ "autopkgtest exited with code: %i",
11 code,
12- body.encode(),
13 )
14 running_test = False
15 # we ack the message so it doesn't go back in the queue
16- msg.channel.basic_ack(msg.delivery_tag)
17+ if code == -15:
18+ logging.info(
19+ "autopkgtest has been killed with SIGTERM, indicating a restart or stop, requeue-ing message %s"
20+ % body
21+ )
22+ msg.channel.basic_reject(
23+ msg.delivery_tag, requeue=True
24+ )
25+ elif code == -9:
26+ logging.info(
27+ "autopkgtest has been killed intentionally, removing message %s from the queue"
28+ % body
29+ )
30+ msg.channel.basic_ack(msg.delivery_tag)
31+ else:
32+ logging.info(
33+ "autopkgtest has hit an obscure failure mode, removing message %s from the queue"
34+ % body
35+ )
36+ msg.channel.basic_reject(
37+ msg.delivery_tag, requeue=True
38+ )
39 logging.info(
40 "Killing openstack server with uuid %s",
41 test_uuid,
42diff --git a/docs/administration.rst b/docs/administration.rst
43index e899b03..302fcc7 100644
44--- a/docs/administration.rst
45+++ b/docs/administration.rst
46@@ -485,6 +485,9 @@ In order to kill a currently running test, grab the test uuid. This can be seen
47
48 ps aux | grep runner | grep $uuid
49 # grab the PID from the process
50+ # If you want to kill the test and remove the test request from the queue:
51+ kill -9 $pid
52+ # If you want to kill the test and preserve the test request in the queue:
53 kill -15 $pid
54
55 This will kill the autopkgtest process, and then the worker will `ack` the test request

Subscribers

People subscribed via source and target branches