lp:~andersson123/autopkgtest-cloud

Owned by Tim Andersson
Get this repository:
git clone https://git.launchpad.net/~andersson123/autopkgtest-cloud
Only Tim Andersson can upload to this repository. If you are Tim Andersson please log in for upload directions.

Branches

Name Last Modified Last Commit
worker-dont-remove-queue-item-systemctl-restart 2024-05-02 16:51:26 UTC
fix: worker: don't ack message if worker was killed with USR1 (code -15)

Author: Tim Andersson
Author Date: 2024-05-02 16:43:58 UTC

fix: worker: don't ack message if worker was killed with USR1 (code -15)

With the recent introduction [1] of the easing of killing running tests
- we hit an unforeseen issue! systemctl restarting the worker service
currently falls into the block of logic that removes the openstack
server AND removes the test request from the queue - oh no!

This commit amends that by delineating between exit codes in this else
block. Code -15 is now recognised as being SIGUSR1 from systemd, and
because of this, the message is no longer removed from the queue.

This commit also amends the docs w.r.t. killing running tests. A
different code must be specified to the kill command.

This commit fixes bug LP: #2064582

[1] https://git.launchpad.net/autopkgtest-cloud/commit/?id=d0e3b2ddb3ac5fe4d62f879d2e1c9ca578797120

stop-tests-from-webpage 2024-05-02 16:37:02 UTC
feat: cloud&web: add option to stop test from webpage

Author: Tim Andersson
Author Date: 2024-04-23 15:02:54 UTC

feat: cloud&web: add option to stop test from webpage

This commit introduces the functionality of being able to kill a
currently running test from the autopkgtest webpage.

*Test-killer*
It introduces a new script, test-killer, which runs on the cloud worker
units as a systemd service.

test-killer listens to requests via amqp on the "tests-to-kill"
exchange. Test uuid's are part of the message sent on this exchange to
test-killer, and test-killer then kills the test using the test uuid.

The initial message in the test-killer queue will look as such:
{
    "uuid": "b864593b-82e2-424e-bfe7-f37748dbd047",
    "not-running-on": [],
}

The "not-running-on" list gets appended when a worker unit checks for
the test with the given uuid and the test isn't present on that specific
worker unit. test-killer appends the hostname of the current worker unit
to this list.

When the length of the "not-running-on" list is equal to the number of
worker units, the message is removed from the queue if the uuid is not
found in queues.json. In this case we assume the test has finished
before we've had a chance to kill it.

In this way, you can simply pass test-killer a uuid, and via amqp it'll
check for the test on every worker unit.

*web changes*
The running page now displays a link under each running job (for admins
only) which redirects to a new app under webcontrol - test-manager.

test-manager has only one endpoint, similar to request/app.py. This
endpoint is only available to a select few admins.

This list of admins is now a config option for the charm (admin-nicks).
This is in the service-bundle, with a sensible default set.

This endpoint can be passed a uuid (uuid=$uuid), which then submits that
uuid to the tests-to-kill exchange. test-manager first checks that the
uuid is present in running.json, however, as to avoid wasting resources
on killing a test that isn't already running.

If the given uuid is found in running.json, that uuid is sent via amqp
to the test-killer services on the various worker units, where the test
is then killed.

upgrade-charm-to-docs 2024-05-02 14:22:20 UTC
docs: remove mention of new dependencies requiring a unit replacement

Author: Tim Andersson
Author Date: 2024-03-28 14:39:04 UTC

docs: remove mention of new dependencies requiring a unit replacement

dont-let-cache-amqp-hang 2024-05-02 11:20:54 UTC
fix: web: utilise RuntimeMaxSec to stop cache-amqp from ever hanging

Author: Tim Andersson
Author Date: 2024-05-02 11:20:54 UTC

fix: web: utilise RuntimeMaxSec to stop cache-amqp from ever hanging

cache-amqp would previously hang when there was a fault with the
semaphore queues - a brittle mechanism which we intend to fix in the
future if we decide to move back to a distributed system r.e. the
autopkgtest-web units.

Currently, with recent changes from Skia, we have disabled this
mechanism, since we're only running one web unit it's no longer
necessary. However, if we were, in the future, going to move again back
to a distributed system, these unit changes would be beneficial, and
there's no downside to having them now.

Switching the type from oneshot to simple was for a very good reason -
oneshot is not compatible with RuntimeMaxSec [1].

[1] https://www.freedesktop.org/software/systemd/man/latest/systemd.service.html#RuntimeMaxSec=

proposed-package-images 2024-05-01 15:52:48 UTC
fix: cloud: make create-nova-image-with-proposed-package up to date

Author: Tim Andersson
Author Date: 2024-05-01 15:15:30 UTC

fix: cloud: make create-nova-image-with-proposed-package up to date

This commit introduces a new mechanism of loading creds for the
aforementioned script, given that we now use a wider variety of
datacentres, and this script was last updated nearly 5 years ago.

This script also adds two new dependencies to the cloud-worker charm:
- qemu-user-static
- binfmt-support
These are required for the script to execute bash commands on vm's on a
different arch to the host VM - i.e. on arm64 from an amd64 host.

It also modifies the mechanism in which the desired package is installed
from proposed, and does away with the sed line that existed before.

auto-queue-cleanup 2024-05-01 15:13:41 UTC
feat: web: add queue-cleaner script

Author: Tim Andersson
Author Date: 2024-04-23 10:48:35 UTC

feat: web: add queue-cleaner script

This script removes unnecessary items from the queue by checking for
items in the queue that have triggers that aren't in the proposed
or release pockets.

It also checks the queues for any duplicate items - after a specific
item has been found, if another queue item has the same parameters,
it is removed from the queue.

This script runs every 15 minutes, and should help our throughput by not
having unnecessary queue items.

This commit also adds a flock call to the cache-amqp and queue-cleaner
services to ensure that these services never run at the same time
as to avoid any issues with parallel reads of the queues.

filter-amqp-all-queues-option 2024-04-30 13:16:33 UTC
feat: cloud: add option to clean regex from all queues in filter-amqp

Author: Tim Andersson
Author Date: 2024-03-01 11:57:42 UTC

feat: cloud: add option to clean regex from all queues in filter-amqp

If you pass all instead of the queue name, filter-amqp will remove
test requests for all queues matching the regex.

This commit also renames --all to --all-items-in-queue, as to avoid
confusion when running filter-amqp with the queue name set to all.

no-double-download-results 2024-04-30 13:04:24 UTC
fix: web: add hacky fix for download-results to stop duplicate results

Author: Tim Andersson
Author Date: 2024-04-26 10:23:28 UTC

fix: web: add hacky fix for download-results to stop duplicate results

fix-cache-amqp-creds 2024-04-30 10:57:09 UTC
fix: web: fix cache-amqp incorrectly parsing private jobs

Author: Tim Andersson
Author Date: 2024-04-30 09:04:22 UTC

fix: web: fix cache-amqp incorrectly parsing private jobs

Some private jobs were recently queued without the newline character
present in the test request string.

Due to the try-except we previously had here, we would fall back to
params={}. This was problematic, as for private jobs we rely on checking
the key value pairs of the test request message to accurately denote
them as private jobs.

This commit marks all test requests that are in the incorrect format as
"malformed request" in queued.json.

stop-looping-fix 2024-04-29 16:08:21 UTC
fix: worker: also fake up files in the case of unidentified testbed failure

Author: Tim Andersson
Author Date: 2024-04-29 16:08:21 UTC

fix: worker: also fake up files in the case of unidentified testbed failure

seed-new-release-update-auth-version 2024-04-26 19:19:21 UTC
fix: cloud: update swift auth version for seed-new-release

Author: Tim Andersson
Author Date: 2024-04-26 19:19:21 UTC

fix: cloud: update swift auth version for seed-new-release

The auth version on the "new" bastion is 3.0.

uuid-db-column-unique-constraint 2024-04-26 13:34:15 UTC
fix: web: add UNIQUE constraint to uuid column creation in helpers/utils.py `...

Author: Tim Andersson
Author Date: 2024-04-26 13:33:33 UTC

fix: web: add UNIQUE constraint to uuid column creation in helpers/utils.py `init_db`

Whilst this doesn't fix the issue of two "write requests" going into
the sqlite-writer queue, this commit would still mean we get no
duplicate results.

Even in the case of two duplicate queue message, this commit would just
ensure that the original entry is just replaced with the duplicate
entry.

fix-killing-tests-api-version-parsing 2024-04-25 09:30:05 UTC
fix: worker: fix api version check for datacentres where this isn't explicitl...

Author: Tim Andersson
Author Date: 2024-04-25 08:52:40 UTC

fix: worker: fix api version check for datacentres where this isn't explicitly defined

Not a critical bug, but killing any tests on datacentres without the
api version explicitly defined wasn't killing the test itself in a
structured manner, but just killing the worker service.

Fixes bug LP: #2063429

three-tmpfails-no-looping-please 2024-04-24 15:26:53 UTC
fix: worker: Never, ever let tests permanently loop

Author: Tim Andersson
Author Date: 2024-04-24 13:09:50 UTC

fix: worker: Never, ever let tests permanently loop

This commit completely removes the mechanism in which a worker is
assumed to be "broken" in some way. This mechanism has in the past on
countless occasions caused tests to permanently loop, and we've
decided to kill it with fire.

temp-disable-content-length 2024-04-24 15:09:45 UTC
fix: web: disable content-length header for static files

Author: Tim Andersson
Author Date: 2024-04-24 13:13:46 UTC

fix: web: disable content-length header for static files

britney still as of today utilises the content-length header if it is
present in the autopkgtest.db download.

However, after recent changes [1] to the apache2 package for focal, we've
discovered that the content-length header is no longer 100% accurate.

Because of this, we will disable the content-length header.

[1] https://bugs.launchpad.net/ubuntu/+source/apache2/+bug/2061816

fix-tims-recent-docs 2024-04-23 15:11:28 UTC
docs: fix missing lines inbetween code-block and code for "Resizing Volumes" ...

Author: Tim Andersson
Author Date: 2024-04-23 15:11:28 UTC

docs: fix missing lines inbetween code-block and code for "Resizing Volumes" and "Killing Running Tests" sections

user-specific-page 2024-04-23 12:32:15 UTC
refactor: web: make browse-results.html use the results_table_core macro

Author: Tim Andersson
Author Date: 2024-04-17 17:10:32 UTC

refactor: web: make browse-results.html use the results_table_core macro

align-with-prod 2024-04-23 08:16:34 UTC
service-bundle: align with service bundle in prod

Author: Tim Andersson
Author Date: 2024-04-23 08:12:03 UTC

service-bundle: align with service bundle in prod

make-killing-tests-less-painful 2024-04-22 13:26:41 UTC
docs: add section on killing a currently running test

Author: Tim Andersson
Author Date: 2024-04-22 13:06:31 UTC

docs: add section on killing a currently running test

worker-upstream-percentage 2024-04-22 10:58:54 UTC
feat: cloud: add upstream percentage as juju config option

Author: Tim Andersson
Author Date: 2024-04-15 11:20:31 UTC

feat: cloud: add upstream percentage as juju config option

This was a feature request from Brian Murray.

Using this new feature, we can, on-the-fly, change the percentage of
jobs that will willingly take upstream tests.

This can be useful in situations where we'd like to prioritise distro
tests or prioritise upstream tests.

To modify, on the fly:

juju config autopkgtest-$type-worker worker-upstream-percentage="$perc"

Where $type is [cloud | lxd] and $perc is an integer between 1 and 100.

more-exceptions-worker-put-object 2024-04-22 09:19:16 UTC
fix: worker: Catch all exceptions in the try-except in swiftclient put_object...

Author: Tim Andersson
Author Date: 2024-04-22 08:48:40 UTC

fix: worker: Catch all exceptions in the try-except in swiftclient put_object call

We've been seeing recurring swift errors with the following traceback:

```
 7571 Apr 22 00:47:27 juju-7f2275-prod-proposed-migration-environment-3 sh[1767534]: File "/home/ubuntu/autopkgtest-cloud/worker/worker", line 1388, in request¬
 7572 Apr 22 00:47:27 juju-7f2275-prod-proposed-migration-environment-3 sh[1767534]: swiftclient.put_object(¬
...
 7586 Apr 22 00:47:27 juju-7f2275-prod-proposed-migration-environment-3 sh[1767534]: raise ConnectionError(err, request=request)¬
 7587 Apr 22 00:47:27 juju-7f2275-prod-proposed-migration-environment-3 sh[1767534]: requests.exceptions.ConnectionError: ('Connection aborted.', OSError("(32, 'EPIPE')"))¬
```

This exception isn't currently caught by the except statement meaning put_object
won't retry in this case.

fix-trailing-whitespace 2024-04-18 13:23:38 UTC
docs: fix trailing whitespace in administration.rst

Author: Tim Andersson
Author Date: 2024-04-18 13:23:38 UTC

docs: fix trailing whitespace in administration.rst

resize-docs 2024-04-18 09:08:40 UTC
docs: add section on resizing ceph (tmp) partitions

Author: Tim Andersson
Author Date: 2024-04-15 17:00:18 UTC

docs: add section on resizing ceph (tmp) partitions

This commit adds a section detailing the process to increase the size of
our cloud worker partitions, details when this may be a pertinent step
to take, and also concisely details our findings related to the juju
units recognising the increase in disk size.

apache_logging_monitoring 2024-04-17 16:14:31 UTC
feat: web: add apache monitoring and reporting to autopkgtest-web

Author: Tim Andersson
Author Date: 2023-08-07 15:26:04 UTC

feat: web: add apache monitoring and reporting to autopkgtest-web

This commit introduces a new script, `apache-request-monitoring`, which
monitors the exit codes of http requests to our apache server.

It captures the exit code and the count of said exit code in the last 5
minutes.

This commit also adds the necessary service file, and adds the needed juju
config options for the charm (the influx creds).

fix-apache-transfer-encoding-for-autopkgtest-db 2024-04-16 14:45:03 UTC
fix: web: fix no content-length header for static file endpoints

Author: Tim Andersson
Author Date: 2024-04-16 14:06:10 UTC

fix: web: fix no content-length header for static file endpoints

britney recently started having an issue with downloading the
autopkgtest.db - suddenly the static endpoint had stopped returning the
"Content-Length" header for these static get requests.

This turned out to be a recent change from a security update for the
apache2 package, see below:
https://launchpad.net/ubuntu/+source/apache2/2.4.41-4ubuntu3.17
https://launchpadlibrarian.net/724225454/apache2_2.4.41-4ubuntu3.16_2.4.41-4ubuntu3.17.diff.gz

It was verified in staging that apache2/2.4.41-4ubuntu3.17 was the
problematic version - I tested apache2/2.4.41-4ubuntu3.16 and the
Content-Length header was present when downloading the db.

The header was no longer present because apache2 now serves static files
by default with the "chunked" transfer-encoding, rather than the
"identity" transfer-encoding (which includes the Content-Length header,
Content-Length header is not compatible with a chunked transfer
encoding, see [1]).

This bug:
https://bugs.launchpad.net/ubuntu/+source/apache2/+bug/2061816

Was opened, where a discussion with the package maintainer was had, and
the conclusion was that this new behaviour by default was intended and
is a necessary security patch.

I was pointed to this thread:
https://bz.apache.org/bugzilla/show_bug.cgi?id=68872

Which helpfully had a solution that I modified for our web app.

All static files with the config present in this commit now get served
with the "identity" transfer-encoding and have the appropriate
Content-Length header.

[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Transfer-Encoding#directives

docs-add-queue-cleanup-section 2024-04-15 11:22:27 UTC
docs: add section on how to do queue cleanup of obsoleted packages

Author: Tim Andersson
Author Date: 2024-03-19 12:33:25 UTC

docs: add section on how to do queue cleanup of obsoleted packages

add-djlint-to-ci 2024-04-11 13:54:02 UTC
fix: web: reformat html templates in accordance with djlint

Author: Tim Andersson
Author Date: 2024-04-11 13:52:36 UTC

fix: web: reformat html templates in accordance with djlint

lp-2060213-fix 2024-04-11 13:02:02 UTC
fix: cloud: fix unattended upgrades interrupting lxd tests

Author: Tim Andersson
Author Date: 2024-04-11 13:02:02 UTC

fix: cloud: fix unattended upgrades interrupting lxd tests

Prior to this commit, we were seeing instances of lxd armhf tests
failing with the following:
954s E: Could not get lock /var/lib/dpkg/lock-frontend. It is held by process 763 (apt-get)
954s E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?

This is because unattended upgrades weren't disabled in our lxd armhf
runners, whereas they are for all other architectures.

This commit introduces a new setup script, setup-canonical-lxd.sh

This script runs all the necessary setup commands for our lxd-armhf
runners.

This commit also moves the setup-commands from the service bundle for
the autopkgtest-lxd-worker application into the setup-canonical-lxd.sh
script. It replaces the setup-commands entry in the service bundle with
the setup script, rather than the individual commands.

Fixes bug LP: #2060213

service-message 2024-04-11 12:06:07 UTC
feat: web: add the possibility of displaying a service message with juju config

Author: Tim Andersson
Author Date: 2024-04-10 15:43:30 UTC

feat: web: add the possibility of displaying a service message with juju config

This commit adds the possibility of displaying a short service message
at the top of all pages.

This is useful in situations where there's an issue/recent bug that'll
affect all users.

To add a service message, use:

juju config autopkgtest-web important-service-message "my message".

The message is displayed with black text on a yellow background.

Displaying the service message requires a restart to apache2.

backup-worker-logs 2024-04-10 10:10:24 UTC
feat: cloud-worker: Add service and timer for store-worker-logs

Author: Tim Andersson
Author Date: 2023-12-11 11:58:23 UTC

feat: cloud-worker: Add service and timer for store-worker-logs

lxd-cleanup-srv-files 2024-04-10 10:05:01 UTC
autopkgtest-cloud-worker: fix: remove old service files for lxd units

Author: Tim Andersson
Author Date: 2023-12-04 12:35:31 UTC

autopkgtest-cloud-worker: fix: remove old service files for lxd units

Whenever we introduce a new lxd remote and remove an old one, the
systemd services for that specific remote are still leftover. This MP
fixes that, and should make debugging armhf problems a bit easier.

preserve-set-correct-content-encoding 2024-04-10 09:46:29 UTC
cloud-worker: feat: script for setting correct content encoding

Author: Tim Andersson
Author Date: 2023-12-05 16:28:19 UTC

cloud-worker: feat: script for setting correct content encoding

There was an issue when migrating our swift storage where the objects from
our old swift storage were copied successfully but had an incorrect content-encoding
for the logfiles, which meant users had to download the logs and couldn't view them
in their browser.
This script added in this commit sets the correct content encoding for all logfiles.

amend-inserts-paramstyle-named 2024-04-09 09:30:09 UTC
fix: web: use sqlite3.paramstyled = "named" where necessary

Author: Tim Andersson
Author Date: 2024-03-11 10:52:37 UTC

fix: web: use sqlite3.paramstyled = "named" where necessary

This commit makes other db write operations use the named paramstyle for
sqlite, rather than the default, which is qmark.

This just makes DB inserts a bit cleaner, rather than passing a tuple,
using named parameters is much safer, and adds the bonus of being able
to pass dictionaries to DB inserts.

To quote waveform, the named paramstyle is the "One True Paramstyle"!

https://peps.python.org/pep-0249/#paramstyle

web-fix-config-dir-permissions 2024-04-08 16:40:10 UTC
fix: web: fix file permissions of .config folder

Author: Tim Andersson
Author Date: 2024-04-08 16:24:46 UTC

fix: web: fix file permissions of .config folder

prior to this only the root user had read and write access to this
directory, which is obviously less than ideal. This is due to some
inherent behaviour of pathlib, see [1]. Because of this behaviour, the
.config directory had these permissions, but the
.config/autopkgtest-web/ directory had correct permissions.

This commit also explicitly sets the permissions for the directories,
just as an added way of ensuring these directories will have the correct
permissions.

[1] https://docs.python.org/3/library/pathlib.html#pathlib.Path.mkdir

fix-allowed-requestor-teams 2024-04-08 16:07:40 UTC
fix: web: rstrip the team name for allowed-requestor-teams

Author: Tim Andersson
Author Date: 2024-04-08 16:04:31 UTC

fix: web: rstrip the team name for allowed-requestor-teams

Prior to this commit, whitespace characters from the indented yaml juju
config options were preserved causing LP api calls to " example-team"
or something along those lines.

generate-charm-inventory-amendment 2024-04-08 09:37:35 UTC
fix: generate-charm-inventory: ignore all merge commits

Author: Tim Andersson
Author Date: 2024-04-08 09:37:35 UTC

fix: generate-charm-inventory: ignore all merge commits

worker-re-enable-retries 2024-04-08 08:42:23 UTC
fix: worker: re-enable retries

Author: Tim Andersson
Author Date: 2024-04-08 08:42:23 UTC

fix: worker: re-enable retries

admin-page-running-for-logtail-mismatch-heuristic 2024-04-05 16:28:16 UTC
feat: web: add tests to admin page which have a mismatch between the logtail ...

Author: Tim Andersson
Author Date: 2024-04-05 16:25:22 UTC

feat: web: add tests to admin page which have a mismatch between the logtail timestamp and running_for value

Fixes bug LP: #2058463

04052024-bos02-armhf-update 2024-04-05 16:26:39 UTC
service-bundle: modify IPs for bos02 armhf in line with recent changes

Author: Tim Andersson
Author Date: 2024-04-05 16:26:39 UTC

service-bundle: modify IPs for bos02 armhf in line with recent changes

preserve-a-p-in-queued-tests 2024-04-05 15:39:31 UTC
fix: web: show all-proposed for queued tests on results pages

Author: Tim Andersson
Author Date: 2024-04-05 09:28:56 UTC

fix: web: show all-proposed for queued tests on results pages

fix-ci-failures 2024-04-05 10:51:02 UTC
fix: web: add __init__.py to private_results/ to fix CI failures

Author: Tim Andersson
Author Date: 2024-04-05 10:51:02 UTC

fix: web: add __init__.py to private_results/ to fix CI failures

swift-cleanup 2024-04-05 10:40:34 UTC
feat: web: add script to cleanup broken swift results

Author: Tim Andersson
Author Date: 2024-04-03 13:09:01 UTC

feat: web: add script to cleanup broken swift results

fix-amqp-status-collector 2024-04-04 13:33:40 UTC
fix: web: Add RuntimeDirectoryPreserve to amqp-status-collector.service

Author: Tim Andersson
Author Date: 2024-04-04 08:06:29 UTC

fix: web: Add RuntimeDirectoryPreserve to amqp-status-collector.service

This fixes the following issue:
- When RabbitMQ is unresponsive, the amqp-status-collector script fails repeatedly
- When amqp-status-collector isn't running, the /run/amqp-status-collector/
  directory is removed, due to the behaviour of RuntimeDirectory
- When that directory is removed, running.json also gets removed
- Lots of other functionality in webcontrol depends upon this file. Such as
  requests, and browsing the /running or results pages

This commit fixes the issue by adding RuntimeDirectoryPreserve=yes to
amqp-status-collector.service. This flag, when set to restart or yes, causes
the runtime directory to not be removed when the systemd unit is down.

improve-handling-cache-amqp-failures 2024-04-03 13:30:32 UTC
fix: web: make send_amqp_request function have a timeout

Author: Tim Andersson
Author Date: 2024-01-17 11:26:38 UTC

fix: web: make send_amqp_request function have a timeout

We were recently having some issues with production, where
requesting a test via the webpage would result in the page
eventually telling the user the server has timed out. This
was because the send_amqp_request function was failing, but
the library used to send the request has no internal timeout.

send_amqp_request was failing because the rabbitmq-server ran
out of memory.

This commit amends the above issue by adding a timeout for the
send_amqp_request function is the form of an signal handler,
via the `signal` python module.

This commit also introduces a new exception parent class,
InternalException. And a child class of this class, QueueDead.

This commit also introduces a new child class of WebControlException,
QueueDead.

This exception is raised when the rabbitmq connection times out, as
described above. Now, when this happens, the exception is raised and the
error message defined in the QueueDead declaration is displayed to the
user.

allowed-teams-to-juju-config 2024-04-03 12:00:18 UTC
feat: web: move ALLOWED_TEAMS to juju config instead of being hardcoded in re...

Author: Tim Andersson
Author Date: 2024-02-27 14:48:22 UTC

feat: web: move ALLOWED_TEAMS to juju config instead of being hardcoded in request/submit.py

lxd-metrics-update 2024-04-03 11:12:44 UTC
fix: cloud: only check intended ips for autopkgtest-lxd-worker metrics

Author: Tim Andersson
Author Date: 2024-02-07 15:38:30 UTC

fix: cloud: only check intended ips for autopkgtest-lxd-worker metrics

This commit adds a fix to the lxd metrics - we don't have a metric right now
which checks the remotes specified in the service bundle if they aren't present
in lxc remote list on the autopkgtest-lxd-worker.

So this checks the list of intended remotes and makes note of any intended remotes
which aren't in lxc remote list.

We also currently report on remotes which aren't specified in the service bundle,
but I think that's fine to leave in the metrics as it's indicative of issues.

This commit also writes lxc-remotes.json to the ~ directory on the lxd
worker, as the metrics script now uses this information to more
accurately report the metrics.

cloud-worker-tmp-cleanup 2024-04-03 11:09:56 UTC
feat: cloud: add worker tmp cleanup config

Author: Tim Andersson
Author Date: 2024-02-16 11:23:49 UTC

feat: cloud: add worker tmp cleanup config

tmp doesn't get automatically cleaned up periodically, only on boot. This is
problematic as any edge case worker errors that cause the worker script to
exit before cleaning up the logfile directory leaves the entire directory
in tmp, leading to low disk space errors.

This commit introduces a config file which removes files and directories
in /tmp that haven't been modified in the last 30 days.

It adds the tmp cleanup config to the service bundle common
options for the autopkgtest-cloud-worker application.

It also adds the config option to layer.yaml.

And it also writes the cleanup config to /etc/tmpfiles.d/tmp.conf

pull-amqp-push-amqp 2024-04-03 09:21:50 UTC
feat: cloud: add pull-amqp and push-amqp scripts

Author: Tim Andersson
Author Date: 2024-03-27 12:35:30 UTC

feat: cloud: add pull-amqp and push-amqp scripts

pull-amqp is a script that pulls all message from a queue. If the script
is passed a regex, it will only pull the messages from the queue that
match said regex. If the --empty arg is passed, it'll remove said
messages.

push-amqp is a script that simply pushes a message to a specified queue.
The two scripts can be used in conjunction to easily shift specific
queue messages from one queue to another, removing the need to craft a
retry-autopkgtest-regressions command to shift tests between queues.

push-amqp can also be used to push messages to other queues, like the
sqlite-writer queue or the download-results queue.

Fixes bug LP: #2059235

configparser-read-refactor 2024-04-02 17:23:40 UTC
refactor: web&cloud: replace configparser.read with configparser.read_file or...

Author: Tim Andersson
Author Date: 2024-03-08 14:17:17 UTC

refactor: web&cloud: replace configparser.read with configparser.read_file or read_string

Also refactors all duplicate usage of configparser.read and shares common
functions from helpers/utils.py, and amends unit tests in line with
these changes

bump-workers-10-percent 2024-04-02 16:48:32 UTC
service-bundle: bump all n-workers by 10% inline with recent quota changes

Author: Tim Andersson
Author Date: 2024-04-02 16:48:10 UTC

service-bundle: bump all n-workers by 10% inline with recent quota changes

rabbitmq-cleanup-fix 2024-04-02 09:40:14 UTC
fix: mojo: Fix deployment of rabbitmq cleanup script

Author: Tim Andersson
Author Date: 2024-04-02 09:03:28 UTC

fix: mojo: Fix deployment of rabbitmq cleanup script

Prior to this commit, the rabbitmq cleanup script wasn't getting copied
over properly to the rabbitmq unit, causing the script to fail when
rabbitmq required restarting.

This commit creates a file for the rabbitmq cleanup script, in the
autopkgtest-cloud repo, instead of just in the deployment script. The
script now gets deployed as intended and functions as intended.

drop-all-all-proposed 2024-03-28 15:04:51 UTC
fix: web: remove all all-proposed for noble

Author: Tim Andersson
Author Date: 2024-03-28 13:56:33 UTC

fix: web: remove all all-proposed for noble

exit-code-14 2024-03-28 14:19:01 UTC
fix: web: fix display of exit code 14 tests

Author: Tim Andersson
Author Date: 2024-03-28 14:19:01 UTC

fix: web: fix display of exit code 14 tests

We received a ping on IRC on 27/03/2024, about "exit code 14"
being displayed on the results pages, which is somewhat confusing
and non-descript.

This mp fixes this by giving exit code 14 tests a human exit code
that is descriptive w.r.t the autopkgtest man page.

login-button 2024-03-28 10:48:11 UTC
feat: web: share login details between browse.cgi and request.cgi

Author: Tim Andersson
Author Date: 2024-03-27 15:08:28 UTC

feat: web: share login details between browse.cgi and request.cgi

This commit makes our flask app share the flask session between
browse.cgi and request.cgi.

Users, when not logged in, will now see a "Login" button on the navbar.
Clicking this will log them in using the pre-existing mechanism in
request.cgi.

postfix-systemd-fail-strings 2024-03-28 10:10:24 UTC
fix: worker: add fail strings for systemd failures and postfix failures

Author: Tim Andersson
Author Date: 2024-03-28 10:10:24 UTC

fix: worker: add fail strings for systemd failures and postfix failures

These failures seem to cause forever looping tests, so we need to
correctly recognise them as fail strings.

reduce-retries 2024-03-27 14:48:12 UTC
fix: worker: reduce retries

Author: Tim Andersson
Author Date: 2024-03-25 15:07:57 UTC

fix: worker: reduce retries

This commit reduces the number of retries in prod by acknowledging
all testbed failures as real failures unless they have temporary
testbed failure strings present in the logs

integration-tests 2024-03-26 14:49:52 UTC
feat: web: add infrastructure testing endpoints to browse.cgi

Author: Tim Andersson
Author Date: 2024-01-15 13:32:48 UTC

feat: web: add infrastructure testing endpoints to browse.cgi

This commit adds two new endpoints to browse.cgi.

endpoint: post-infrastructure-results
This endpoint is for units in the autopkgtest-cloud environment to
report the results of their infrastructure tests. The endpoint makes
sure the poster is authenticated (using a key on the relevant
machines) and saves the results of the infrastructure tests to a json
file on disk, with an appropriate format.

endpoint: infrastructure-test-results
This endpoint simply returns the results of the infrastructure tests
within the autopkgtest-cloud environment by reading the
aforementioned json file from disk.

add-data-dir-option-to-browse-test 2024-03-26 14:02:07 UTC
feat: web: tests: add data-dir option to browse-test.py

Author: Tim Andersson
Author Date: 2024-03-26 14:00:23 UTC

feat: web: tests: add data-dir option to browse-test.py

This commit adds a --data-dir option to browse-test, allowing
the user to just pass that, and the script will assume queued.json,
running.json and autopkgtest.db are in the folder passed.

make-setup-command2-optional 2024-03-26 13:28:07 UTC
feat: cloud: make setup_commands optional instead of required

Author: Tim Andersson
Author Date: 2024-03-26 13:24:38 UTC

feat: cloud: make setup_commands optional instead of required

If you now set setup_command or setup_command2 to an empty
string (""), it won't write the juju config variable to
the worker config files.

api-key-loading-fix 2024-03-26 12:38:10 UTC
fix: web: load api-keys properly

Author: Tim Andersson
Author Date: 2024-03-26 12:38:10 UTC

fix: web: load api-keys properly

worker-private-ppa-modtig 2024-03-25 13:53:48 UTC
fix: cloud: change apt repository set up for private ppa's based on if releas...

Author: Tim Andersson
Author Date: 2024-03-22 15:24:56 UTC

fix: cloud: change apt repository set up for private ppa's based on if release is noble or not.

noble-all-proposed 2024-03-25 11:17:29 UTC
fix: web: run all noble tests with all-proposed=1

Author: Tim Andersson
Author Date: 2024-03-25 11:04:56 UTC

fix: web: run all noble tests with all-proposed=1

This is to help with the current state of things. A lot of tests
currently fail due to dependency issues and running with all-proposed
will help alleviate the issue

add-running-count 2024-03-22 12:04:09 UTC
feat: web: add a running count to the /running endpoint of browse.cgi

Author: Tim Andersson
Author Date: 2024-03-21 18:06:24 UTC

feat: web: add a running count to the /running endpoint of browse.cgi

Fixes bug LP: #2058685

queued-tests-show-uuid 2024-03-21 14:54:05 UTC
fix: web: show uuid for queued tests on package/release/arch pages instead of...

Author: Tim Andersson
Author Date: 2024-03-20 16:41:35 UTC

fix: web: show uuid for queued tests on package/release/arch pages instead of "N/A"

db-backup-add-checksum-to-filename 2024-03-20 12:06:48 UTC
feat: web: add checksum of db backup file to swift object path in db-backup s...

Author: Tim Andersson
Author Date: 2024-03-19 11:26:45 UTC

feat: web: add checksum of db backup file to swift object path in db-backup script

fix-package-page-nonexistent-package 2024-03-19 15:43:47 UTC
fix: web: Don't show package pages for packages that don't exist

Author: Tim Andersson
Author Date: 2024-03-19 15:20:15 UTC

fix: web: Don't show package pages for packages that don't exist

Fixes bug LP: #2058059

filter-amqp-read-creds-from-rabbitmq-cred 2024-03-19 14:55:49 UTC
fix: cloud: make filter-amqp get rabbit secrets from ~/rabbitmq.cred instead ...

Author: Tim Andersson
Author Date: 2024-03-19 14:20:11 UTC

fix: cloud: make filter-amqp get rabbit secrets from ~/rabbitmq.cred instead of command line

create-test-instances-set-ex 2024-03-19 10:00:44 UTC
feat: cloud: add set -ex to create-test-instances for verbosity

Author: Tim Andersson
Author Date: 2024-03-19 10:00:44 UTC

feat: cloud: add set -ex to create-test-instances for verbosity

time_t-armhf 2024-03-18 17:20:11 UTC
feat: web: temporarily add all-proposed=1 for armhf noble tests for time_t tr...

Author: Tim Andersson
Author Date: 2024-03-18 16:46:11 UTC

feat: web: temporarily add all-proposed=1 for armhf noble tests for time_t transition

i386-no-retry-on-build-dep-failure 2024-03-18 14:07:01 UTC
fix: cloud: don't retry on build dependency failure for i386 tests

Author: Tim Andersson
Author Date: 2024-03-18 12:30:15 UTC

fix: cloud: don't retry on build dependency failure for i386 tests

Fixes bug LP: #2058062

lxd-armhf14-lxd-armhf10-replacement 2024-03-18 10:03:15 UTC
service-bundle: replace lxd-armhf10 and lxd-armhf14

Author: Tim Andersson
Author Date: 2024-03-18 10:03:15 UTC

service-bundle: replace lxd-armhf10 and lxd-armhf14

return-uuid-upon-test-request 2024-03-15 17:15:25 UTC
fix: web: return uuid and submit time once user has successfully submitted te...

Author: Tim Andersson
Author Date: 2024-03-14 18:17:29 UTC

fix: web: return uuid and submit time once user has successfully submitted test request

replace-lxd-armhf15 2024-03-15 17:06:36 UTC
service-bundle: replace lxd-armhf15

Author: Tim Andersson
Author Date: 2024-03-15 17:06:36 UTC

service-bundle: replace lxd-armhf15

lxd-armhf-remotes-update 2024-03-15 12:59:44 UTC
service-bundle: replace a bunch of lxc remotes

Author: Tim Andersson
Author Date: 2024-03-15 12:59:44 UTC

service-bundle: replace a bunch of lxc remotes

api-key-dont-load-in-global-namespace 2024-03-14 11:33:00 UTC
fix: web: don't load API_KEYS in global namespace

Author: Tim Andersson
Author Date: 2024-03-14 11:27:42 UTC

fix: web: don't load API_KEYS in global namespace

when we load the API keys in the global namespace, we have to reload apache2
every time we add a new api key, which isn't ideal.

fix-no-api-key-logspam 2024-03-14 11:14:53 UTC
fix: web: remove warning message when api-keys file doesn't exist

Author: Tim Andersson
Author Date: 2024-03-14 11:14:53 UTC

fix: web: remove warning message when api-keys file doesn't exist

This is to avoid egregious logspam.

align-diff-from-prod 2024-03-14 11:07:03 UTC
mojo: upgrade-charm: remove unused variable

Author: Tim Andersson
Author Date: 2024-03-14 11:07:03 UTC

mojo: upgrade-charm: remove unused variable

download-results-fix-lambda 2024-03-14 09:38:26 UTC
fix: web: fix callback function syntax in download-results

Author: Tim Andersson
Author Date: 2024-03-14 09:38:26 UTC

fix: web: fix callback function syntax in download-results

browse-cgi-api-key 2024-03-13 16:00:07 UTC
docs: mention new api-key integration feature for test requests

Author: Tim Andersson
Author Date: 2024-03-13 13:52:13 UTC

docs: mention new api-key integration feature for test requests

sqlite-writer-duplicate-results-fix 2024-03-13 09:22:08 UTC
fix: web: sqlite-writer INSERT changed to INSERT OR REPLACE to avoid duplicat...

Author: Tim Andersson
Author Date: 2024-03-13 09:22:08 UTC

fix: web: sqlite-writer INSERT changed to INSERT OR REPLACE to avoid duplicate db entries

sqlite-writer-bugfixes 2024-03-12 12:20:28 UTC
fix: web: fix issues with sqlite-writer and get_test_id, as well as incorpora...

Author: Tim Andersson
Author Date: 2024-03-12 11:03:33 UTC

fix: web: fix issues with sqlite-writer and get_test_id, as well as incorporate retries for amqp

d-a-r-make-me-faster 2024-03-11 15:57:20 UTC
refactor: web: make download-all-results use swiftclient instead of urllib

Author: Tim Andersson
Author Date: 2024-02-23 17:02:19 UTC

refactor: web: make download-all-results use swiftclient instead of urllib

Using urllib, this script is rather slow and also sometimes just hangs
for a little while, before then continuing.

This refactor makes download-all-results use python3-swiftclient to access
swift objects and containers rather than by using the swift URL directly
and using urllib to download results.

db-backup-fix 2024-03-11 15:51:55 UTC
fix: web: fix object_path in db-backup

Author: Tim Andersson
Author Date: 2024-03-11 15:51:55 UTC

fix: web: fix object_path in db-backup

d-r-d-a-r-merging 2024-03-11 15:32:56 UTC
feat: web: download-all-results now queues items into the sqlite-write-me exc...

Author: Tim Andersson
Author Date: 2024-02-20 18:02:32 UTC

feat: web: download-all-results now queues items into the sqlite-write-me exchange instead of inserting rows into db

This commit makes download-all-results no longer directly place rows into
the database - now, download-all-results places the values from items it
absorbs from its listener queue into the sqlite-write-me queue, and
laterally the sqlite-writer then takes these queue items and inserts
them as rows into the sqlite database.

update-publish-db-comment 2024-03-11 10:55:22 UTC
doc: web: update docstring for publish-db to cover all purposes of the script

Author: Tim Andersson
Author Date: 2024-03-11 10:55:22 UTC

doc: web: update docstring for publish-db to cover all purposes of the script

fix-arch-release-allow-mapping-traceback 2024-03-11 10:02:25 UTC
fix: worker: fix traceback in process_output_dir when a package is marked as ...

Author: Tim Andersson
Author Date: 2024-03-11 09:53:12 UTC

fix: worker: fix traceback in process_output_dir when a package is marked as dont_run

charm-inventory-amendments 2024-03-08 11:35:05 UTC
generate_charm_inventory: add files changed and fix merge commits being in th...

Author: Tim Andersson
Author Date: 2024-03-07 11:43:54 UTC

generate_charm_inventory: add files changed and fix merge commits being in the list

add-flavor-config-to-service-bundle-lxd 2024-03-08 10:07:39 UTC
service-bundle: declare flavors for autopkgtest-lxd-worker

Author: Tim Andersson
Author Date: 2024-03-08 10:07:39 UTC

service-bundle: declare flavors for autopkgtest-lxd-worker

ugj-check-running 2024-03-07 10:47:00 UTC
fix: web: ugj: check if job is running before completing, also check for mult...

Author: Tim Andersson
Author Date: 2024-03-05 15:35:36 UTC

fix: web: ugj: check if job is running before completing, also check for multiple matches in swift storage (this is all to handle manual re-triggers)

fix-web-reactive-hanging 2024-03-05 15:39:41 UTC
fix: web: stop web units hanging "waiting" state in mojo run

Author: Tim Andersson
Author Date: 2024-03-05 14:40:27 UTC

fix: web: stop web units hanging "waiting" state in mojo run

fix-charm-on-install 2024-03-05 11:59:18 UTC
fix: cloud & web: fix charm install failures due to conflicting sniffio version

Author: Tim Andersson
Author Date: 2024-03-05 11:48:10 UTC

fix: cloud & web: fix charm install failures due to conflicting sniffio version

Prior to this commit, for some reason, recently when installing the
charms on a juju unit, we'd get tracebacks related to sniffio.

For some reason, the sniffio version bumped from 1.3.0 to 1.3.1 when
installing the charm. This conflicts with our wheelhouse files, which
specify setup-tools==46.0.0 if python3.8.

sniffio 1.3.1 depends on setup-tools > 64.0.0, which causes pip install
tracebacks in the juju logs.

This commit pins sniffio 1.3.0 in the wheelhouse files, which amends the issue.

generate-charm-inventory 2024-03-04 15:53:07 UTC
feat: dev: add generate-charm-inventory script

Author: Tim Andersson
Author Date: 2024-03-04 13:45:33 UTC

feat: dev: add generate-charm-inventory script

ugj-use-configparser-no-envfile 2024-03-01 15:45:11 UTC
refactor: web: update-github-jobs reads public-swift-creds with configparser....

Author: Tim Andersson
Author Date: 2024-02-26 12:22:25 UTC

refactor: web: update-github-jobs reads public-swift-creds with configparser.read_string rather than just using an EnvironmentFile

I think this is just a better way of doing things than using environmental variables.

cleanup_old_ppa_containers 2024-02-26 17:52:08 UTC
feat: autopkgtest-lxd-worker: add swift cleanup of old ppa containers

Author: Tim Andersson
Author Date: 2023-08-03 10:22:19 UTC

feat: autopkgtest-lxd-worker: add swift cleanup of old ppa containers

The swift database is currently a bit overloaded with lots of results
from people testing PPA's, with some results being from a very long time
ago. This is obviously less than ideal and after some recent discussion
26 weeks was decided to be an appropriate amount of time for a max age
for PPA results.

sqlite-db-backup 2024-02-26 15:57:26 UTC
feat: web: add service files for db-backup script

Author: Tim Andersson
Author Date: 2024-02-05 16:43:35 UTC

feat: web: add service files for db-backup script

create-test-instances-refactor 2024-02-23 17:26:35 UTC
refactor: cloud: refactor create-test-instances to use common functions

Author: Tim Andersson
Author Date: 2024-02-23 13:07:19 UTC

refactor: cloud: refactor create-test-instances to use common functions

This commit refactors the create-test-instances script to use the common
functions from create-instances-common.

Overall this makes the script more verbose and it now tests a wider
range of functionalities when creating a test instance, which should
help admins figure out errors quicker.

create-instances-refactor 2024-02-23 12:30:15 UTC
change get_image function

Author: Tim Andersson
Author Date: 2024-02-23 12:30:15 UTC

change get_image function

fix-dodgy-ci 2024-02-22 12:54:24 UTC
fix: web: fix unit tests after uuid feature addition

Author: Tim Andersson
Author Date: 2024-02-22 12:51:11 UTC

fix: web: fix unit tests after uuid feature addition

fix-install-section-web-systemd-units 2024-02-21 15:28:36 UTC
fix: web: add --no-block to restart autopkgtest-web.target call in reactive p...

Author: Tim Andersson
Author Date: 2024-02-21 13:59:48 UTC

fix: web: add --no-block to restart autopkgtest-web.target call in reactive part of charm

dont-check-huge-tests-for-duplicate-queued-items 2024-02-20 18:31:59 UTC
fix: web: don't consider items in huge queue when checking duplicate test req...

Author: Tim Andersson
Author Date: 2024-02-16 17:58:23 UTC

fix: web: don't consider items in huge queue when checking duplicate test request

This change was Brian Murray's idea.

1100 of 302 results
This repository contains Public information 
Everyone can see this information.

Subscribers