autopkgtest-cloud

lp:~andersson123/autopkgtest-cloud

Git
lp:~andersson123/autopkgtest-cloud

Owned by Tim Andersson

Get this repository:: git clone https://git.launchpad.net/~andersson123/autopkgtest-cloud

Only Tim Andersson can upload to this repository. If you are Tim Andersson please log in for upload directions.

Branches

Name	Last Modified	Last Commit
api-key-howto	2024-07-25 16:56:05 UTC 2024-07-25	docs: detail how to create an API key Author: Tim Andersson Author Date: 2024-07-22 13:40:50 UTC docs: detail how to create an API key Currently, the docs only mention how an individual can use an API key - there is no mention or guideline regarding how an admin should create an API key for a requested user, group or bot. This commit amends the issue by adding a new section which details how an `autopkgtest-cloud` admin should create an API key.
s-n-r-prepend-series-version-to-uuid	2024-07-25 15:52:25 UTC 2024-07-25	fix: seed-new-release: modify uuid before copying testinfo.json Author: Tim Andersson Author Date: 2024-07-25 13:54:58 UTC fix: seed-new-release: modify uuid before copying testinfo.json We realised recently that the uniqueness of the uuid column in the database was causing issues when running seed-new-release. The summary of the problem is as follows: - seed-new-release copies swift objects from the old to new container - the two objects, in testinfo.json within result.tar have the same uuid - we then run download-all-results for the new release - the way the results are entered into the db is with: INSERT OR REPLACE - the uuid column is unique, meaning the database entry for the old release gets REPLACE'd. - We longer have the result for the old release in the db - If one were to then run download-all-results for the old release, the results from the new release would then get removed from the database also. This commit amends the issue by prepending the uuid in testinfo.json with the series version of the old release before uploading it to the container for the new release, preserving the database entries for both the old and new releases.
dump-service-bundle-move-to-terraform	2024-07-24 11:19:46 UTC 2024-07-24	asdf Author: Tim Andersson Author Date: 2024-07-24 11:19:46 UTC asdf
bump-bos03-s390x-quota	2024-07-23 08:48:36 UTC 2024-07-23	service-bundle: add 20 bos03-s390x workers Author: Tim Andersson Author Date: 2024-07-23 08:48:36 UTC service-bundle: add 20 bos03-s390x workers
fix-osc-lib-traceback	2024-07-22 16:41:40 UTC 2024-07-22	fix: worker: import osc_lib in a different manner Author: Tim Andersson Author Date: 2024-07-22 16:41:40 UTC fix: worker: import osc_lib in a different manner The previous method of importing osc_lib was resulting in the following traceback: ``` AttributeError: module 'osc_lib' has no attribute 'utils' ``` I tried importing osc_lib.utils in many different ways - both in a venv on my dev machine and on the servers in prod. The implementation in this commit is the one that worked both in a venv on my machine, and in a python shell in production, as well as in the worker code after a cowboy.
fix-bad-metrics-math	2024-07-22 08:58:13 UTC 2024-07-22	fix: worker: fix bad math in tools/metrics Author: Tim Andersson Author Date: 2024-07-22 08:58:13 UTC fix: worker: fix bad math in tools/metrics I recently made an MP that introduced a new metric - the cloud worker active unit percentage. I noticed this morning on the KPI that there were some negative values, and realised that the math in the previous implementation was only valid under certain circumstances, so I've modified it in this MP.
apache-request-monitoring-investigation	2024-07-12 16:40:52 UTC 2024-07-12	asdf Author: Tim Andersson Author Date: 2024-07-12 16:40:52 UTC asdf
cloud-worker-active-error-percentage	2024-07-12 09:47:43 UTC 2024-07-12	feat: cloud: add new metric for percentage of active cloud worker units Author: Tim Andersson Author Date: 2024-07-10 14:37:08 UTC feat: cloud: add new metric for percentage of active cloud worker units This commit introduces new functionality to the `metrics` script in the autopkgtest-cloud charm. It adds a new metric, just for the cloud worker units: `autopkgtest_unit_status_active_percentage` This is a new metric which we will use in grafana to alert the team when the percentage of active cloud worker units drops below 50% for a specified period of time. This couldn't be done with just pure grafana, due to limitations surrounding alerting and transformations. This has already been tested. This version of the metrics script has been running in a tmux session for a while and the panel can be seen on grafana, already active, with the alert already set up. The data is also dilineated by hostname to help with debugging.
fix-ci-increase-lp-request-timeout	2024-07-12 09:33:24 UTC 2024-07-12	fix: web: don't let unit tests make api calls to launchpad Author: Tim Andersson Author Date: 2024-07-12 09:33:24 UTC fix: web: don't let unit tests make api calls to launchpad This commit refactors a couple of unit tests which weren't mocking the urllib.url_open function call - resulting in our unit tests in CI actually making api calls to Launchpad, which isn't appropriate. They now correctly mock the url_open call and no longer make queries to the Launchpad API. This explains our recent flaky CI - perhaps there was some Launchpad instability, or at least, something causing queries to take a little longer, which was in turn causing our unit tests to fail.
fix-d-a-r-bug	2024-07-10 12:51:16 UTC 2024-07-10	fix: web: fix download-all-results TypeError Author: Tim Andersson Author Date: 2024-07-10 12:44:53 UTC fix: web: fix download-all-results TypeError When running download-all-results for noble recently, the following traceback was encountered: ``` DEBUG:__main__:Fetched test result for noble/armhf/python-keycloak/3.9.0+dfsg-1 20240326_084914_e13af@ (triggers: python3-defaults/3.12.2-0ubuntu1): exit code 8 Traceback (most recent call last): File "./download-all-results", line 264, in <module> fetch_container( File "./download-all-results", line 203, in fetch_container fetch_one_result( File "./download-all-results", line 144, in fetch_one_result env_vars.append("=".join([env, value])) TypeError: sequence item 1: expected str instance, int found ``` This is because of the all_proposed environment variable being set to 1, which is then read as an integer rather than a string. This commit amends the issue by explicitly casting the type of the value in the env_vars to be a string.
swift-cleanup	2024-07-10 08:01:05 UTC 2024-07-10	feat: web: add script to cleanup broken swift results Author: Tim Andersson Author Date: 2024-04-03 13:09:01 UTC feat: web: add script to cleanup broken swift results When running seed-new-release, and copying over results from the last release to the devel release, we encountered a lot of errors in which the testinfo.json file (something created by autopkgtest) wasn't decipherable as json. This commit introduces a script which iterates through testinfo.json files in swift, and any that aren't valid json, or are corrupted in any way, are replaced with a dummy dictionary like so: ``` { "message": "This file has been added manually and is not a result of any test" } ``` This should reduce the error messages when running seed-new-release, and also, make it a bit more clear for developers browsing results - having a clear message like this instead of just an empty file is more explicit.
generate-charm-inventory-dont-list-docs-entries	2024-07-08 14:12:16 UTC 2024-07-08	fix: generate-charm-inventory: don't display docs commits Author: Tim Andersson Author Date: 2024-07-08 14:11:56 UTC fix: generate-charm-inventory: don't display docs commits As pointed out by Brian Murray, changes to the documentation of autopkgtest-cloud have no effect on the functionality of either charm. This commit amends the issue by ignoring commits with a message that begins with "docs:".
service-bundle-staging-remove-net-name	2024-07-03 13:03:39 UTC 2024-07-03	service-bundle: remove net-name variable from staging autopkgtest-cloud-worker Author: Tim Andersson Author Date: 2024-07-03 13:03:39 UTC service-bundle: remove net-name variable from staging autopkgtest-cloud-worker This was missed in a previous MP. The net-name variable can no longer be present as it's no longer an option in the charms config.yaml. Currently, this throws an error when deploying the service-bundle in staging, so this MP will amend that.
service-bundle-add-bos03-s390x-net-name-flavor	2024-07-03 12:48:22 UTC 2024-07-03	service-bundle: add flavors and net name for bos03-s390x Author: Tim Andersson Author Date: 2024-07-03 12:48:22 UTC service-bundle: add flavors and net name for bos03-s390x This commit adds the flavors for (staging) bos03-s390x, and adds the required net names for both staging and production for bos03-s390x.
bos03-ppc64el-small-n-workers-bump	2024-07-03 10:15:53 UTC 2024-07-03	service-bundle: add 20 bos03-ppc64el workers Author: Tim Andersson Author Date: 2024-07-03 10:15:53 UTC service-bundle: add 20 bos03-ppc64el workers The quota for bos03-ppc64el is limited right now. We have a quota of 80 cores currently. This cannot be increased at the moment - IS have stated that ppc64el in bos03 is currently far behind bos03-arm64 in terms of resources. Thus I've added 20 workers for bos03-ppc64el - I believe this should be okay for the time being, though we could hit quota issues in the event that ALL of these workers are using the `big` flavor but this is an unlikely occurence.
fix-package-page-nonexistent-package	2024-06-26 13:05:37 UTC 2024-06-26	fix: web: Don't show package pages for packages that don't exist Author: Tim Andersson Author Date: 2024-03-19 15:20:15 UTC fix: web: Don't show package pages for packages that don't exist This commit changes the behaviour when a user tries to reach a package or results page for a package that doesn't exist. The results page used to throw an error stating that the package doesn't exist, however, I think this is slightly innaccurate - the package could exist, but we could just have no test results for it. This is infact not really, an error, and not something we should surface as an error. The behaviour, with this commit, is as follows: On the results and package pages, if the user goes to one of these pages for a package that doesn't exist, they get the following message: ``` Oops! Looks like this package has no previous results. The package itself may not exist - you can check by clicking the Launchpad icon. ``` I think this is better because, a, we are no longer throwing an error, and b, because the user can now validate the package exists by clicking on the Launchpad icon. Overall I think this is just an accurate representation of all the potential possibilities when going to either of these pages with an "invalid" package name. There is the possibility of checking if the package exists via Launchpad, but that'd be an http request to Launchpad every time a user views a package or results page, which is a waste of resources, and seems unnecessary. Fixes bug LP: #2058059
apache-logging-monitoring-check-error-log-too	2024-06-26 12:12:15 UTC 2024-06-26	fix: web: also check error log in apache-request-monitoring Author: Tim Andersson Author Date: 2024-06-26 12:07:00 UTC fix: web: also check error log in apache-request-monitoring This script had a fatal flaw - we were only checking /var/log/apache2/access.log and not /var/log/apache2/error.log, meaning there were no records of 500's in the data and subsequently in the grafana KPI, which is arguably one of the most important http response status codes we'd want to keep track of. This commit amends the issue by checking both the access.log and the error.log. It also makes the script a little bit more reliable as one of the functions before could return None, which isn't ideal.
autopkgtest-db-sha256-fix	2024-06-26 10:40:05 UTC 2024-06-26	fix: web: ensure that autopkgtest.db.sha256 is symlinked Author: Tim Andersson Author Date: 2024-06-26 08:30:55 UTC fix: web: ensure that autopkgtest.db.sha256 is symlinked The functionality to symlink /home/ubuntu/public/autopkgtest.db to the static directory which the flask web app serves files from was duplicated for autopkgtest.db.sha256 - however a flag wasn't added for this new statically served file, and the flag for the db being symlinked was already set, meaning the `symlink_public_db` function wasn't executed in production and thus autopkgtest.db.sha256 was never symlinked or served via the web app. Additionally to this, symlinking the sha256 file was in the same try/except block as the db itself, where the exception was a FileExistsError. The db symlink already existed, meaning the exception was thrown, and thus the sha256 file wouldn't have been symlinked even with a separate flag as described above. This commit amends the issue by adding a second flag for the sha256 file, and symlinking the db and the sha256 file in a loop, separately. The functionality beforehand in which the public directory is created, was moved to it's own try/except block also.
proposed-package-images	2024-06-25 15:09:27 UTC 2024-06-25	fix: cloud: make create-nova-image-with-proposed-package up to date Author: Tim Andersson Author Date: 2024-05-01 15:15:30 UTC fix: cloud: make create-nova-image-with-proposed-package up to date This commit introduces a new mechanism of loading creds for the aforementioned script, given that we now use a wider variety of datacentres, and this script was last updated nearly 5 years ago. This script also adds two new dependencies to the cloud-worker charm: - qemu-user-static - binfmt-support These are required for the script to execute bash commands on vm's on a different arch to the host VM - i.e. on arm64 from an amd64 host. It also modifies the mechanism in which the desired package is installed from proposed, and does away with the sed line that existed before. create-nova-image-with-proposed is a script, which hasn't been used in quite a while, which rebuilds one of our adt images, with a specified package from proposed. We had to use this recently when the version of base-files in the release pocket was breaking our tests, but the version of base-files in the proposed pocket would fix said issue.
remove-rabbitmq-auto-restart	2024-06-25 13:10:48 UTC 2024-06-25	fix: rabbitmq: remove auto-restart for rabbitmq server Author: Tim Andersson Author Date: 2024-06-25 12:54:17 UTC fix: rabbitmq: remove auto-restart for rabbitmq server For the last few years, rabbitmq was auto-restarting after using up 2GiB of ram. This was a longstanding issue, in which the root cause was addressed in the following commit: https://git.launchpad.net/autopkgtest-cloud/commit/?id=f019281e0fe38d3f298b933b9fd9fcb243795a7a The root cause of the issue was the worker code sending status updates to a queue at a rate which the consumer couldn't keep up with, causing the rabbitmq queue to grow in size in perpetuity. You can see a more complete description of the issue in the message of the commit above. That being said, we can now remove the script that sets up the service which auto-restarts the rabbitmq-server.service (with glee).
flexible-net-names	2024-06-25 10:47:16 UTC 2024-06-25	feat: cloud: make network names flexible for dc/arch combinations Author: Tim Andersson Author Date: 2024-06-24 11:28:04 UTC feat: cloud: make network names flexible for dc/arch combinations Much like the recent change to the flavor config, this commit introduces a mechanism to have specific network names for each datacentre/arch combination. It introduces a new config variable to the autopkgtest-cloud-worker charm, worker-net-names, which is a string with yaml inside of it, the same as the flavor config. The values in the yaml are inserted into the worker-$datacentre-$arch.conf file, as the net-name juju config option used to be. Having a specific datacentre/arch network name isn't required as a default is specified, which is net_$instance-proposed-migration, where instance is either "prod" or "stg". This commit also refactors part of the `write_net_names` function to use pathlib instead of writing to a file using `with open`.
stg-bos03-ppc64el-flavor-names	2024-06-25 10:08:18 UTC 2024-06-25	service-bundle: add flavor names for bos03-ppc64el in staging Author: Tim Andersson Author Date: 2024-06-25 10:08:18 UTC service-bundle: add flavor names for bos03-ppc64el in staging
metrics-worker-healthy-percentage	2024-06-24 16:39:26 UTC 2024-06-24	worker percentage wip Author: Tim Andersson Author Date: 2024-06-24 16:38:55 UTC worker percentage wip
apache-request-monitoring-timer-fix	2024-06-24 13:21:59 UTC 2024-06-24	fix: web: fix timer syntax in apache-request-monitoring.timer Author: Tim Andersson Author Date: 2024-06-24 13:16:42 UTC fix: web: fix timer syntax in apache-request-monitoring.timer This was missed in a previous MP, but the syntax in the timer file for this unit was incorrect and thus was only triggering at 5 past midnight rather than every 5 minutes. This commit amends the issue by adding the proper syntax for running the unit every 5 minutes.
add-djlint-to-ci	2024-06-24 09:38:00 UTC 2024-06-24	web: lint all templates in line with djlint now in pre-commit and CI Author: Tim Andersson Author Date: 2024-06-24 09:38:00 UTC web: lint all templates in line with djlint now in pre-commit and CI
revert-lxd-armhf-security-nesting	2024-06-19 11:54:16 UTC 2024-06-19	Revert "fix: lxd-worker: add security.nesting=true to lxd config" Author: Tim Andersson Author Date: 2024-06-19 11:54:16 UTC Revert "fix: lxd-worker: add security.nesting=true to lxd config" This reverts commit 7e2db60fdb52a81febf88f462383f557abe5b7dd.
fix-lxd-security-nesting	2024-06-19 11:52:32 UTC 2024-06-19	fix: lxd-worker: put security.nesting: true in the profile config Author: Tim Andersson Author Date: 2024-06-19 11:42:03 UTC fix: lxd-worker: put security.nesting: true in the profile config A recent commit introduced this new config option, however it was in the wrong place. It shouldn't be in the global lxd config, it should be configured in the default profile instead [1]. I figured out the correct lxd-init syntax by adding the config option to the profile with: lxc profile set default security.nesting=true And then checking the syntax with: lxc profile show default And adding the same syntax to our creation of the lxd-init file in armhf-lxd.userdata. And then to double check the config option, I launched an instance and double checked the config with: lxc config show <instance-name> --expanded When setting the config option in the default profile, the config option doesn't show up without the --expanded flag. The details on the security.nesting instance option can be found at [2]. [1] https://discuss.linuxcontainers.org/t/where-to-set-lxc-config-defaults-on-a-snap-installation/14595/4 [2] https://documentation.ubuntu.com/lxd/en/latest/reference/instance_options/
lxd-security-nesting-true	2024-06-19 10:01:55 UTC 2024-06-19	fix: lxd-worker: add security.nesting=true to lxd config Author: Tim Andersson Author Date: 2024-06-19 07:16:53 UTC fix: lxd-worker: add security.nesting=true to lxd config There's a version of systemd in oracular-proposed which is purported to break armhf tests (for oracular) once it migrates to the release pocket. TLDR; Any systemd units with credentials on unprivileged containers will fail on oracular tests with the new version of systemd in proposed. This would cause systemd-tmpfiles-setup.service to be broken on the lxd containers, which is a service which creates /var/run/utmp, which is how runlevel is stored. runlevel is checked in lib/VirtSubProc.py [1] in the wait_booted function. So, subsequently, wait_booted would eventually timeout, as systemd-tmpfiles-setup.service would never store runlevel appropriately on the testbed. The workaround was discussed [2] between the systemd maintainer (enr0n) and the lxd team, and the solution was to enable security.nesting for the lxd containers running our armhf tests. security.nesting simply allows for nested containerisation. [3] I tried to find a concrete piece of documentation about where this specific config flag should go in the lxd config, however, I couldn't find anything concrete, apart from these comments [4] and [5]. To summarise, we would be hitting [6] because of [7]. [1] https://salsa.debian.org/ubuntu-ci-team/autopkgtest/-/blob/master/lib/VirtSubproc.py?ref_type=heads#L454 [2] https://github.com/canonical/lxd/issues/13631 [3] https://discuss.linuxcontainers.org/t/what-does-security-nesting-true/7156/4 [4] https://discuss.linuxcontainers.org/t/what-does-security-nesting-true/7156/5 [5] https://discuss.linuxcontainers.org/t/what-does-security-nesting-true/7156/6 [6] https://bugs.launchpad.net/ubuntu/+source/autopkgtest/+bug/1998943 [7] https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/2046486
fix-retry-url	2024-06-19 08:50:56 UTC 2024-06-19	fix: web: fix retry buttons on results page and user page Author: Tim Andersson Author Date: 2024-06-19 06:57:26 UTC fix: web: fix retry buttons on results page and user page This was a change that was missed in the user page MP. The base_url variable wasn't inherited by the macro, causing the retry url to be the current url appended with the retry arguments (package, release, arch, triggers, etc). Obviously this didn't work and users couldn't trigger retries from the webpage. Another issue was that the package, release and version variables weren't inherited by the macro - obvious in retrospect. This commit amends the two issues by: - using url_for('index_root') [1] instead of base_url - explicitly passing package, release, and arch to the results_table_core macro [1] https://flask.palletsprojects.com/en/3.0.x/api/#flask.Flask.url_for
web-reactive-status-fix	2024-06-18 13:35:14 UTC 2024-06-18	fix: web: Don't let autopkgtest-web unit stay in "maintenance" status Author: Tim Andersson Author Date: 2024-06-18 13:26:44 UTC fix: web: Don't let autopkgtest-web unit stay in "maintenance" status In prod, with the recent changes to the web unit regarding restarting the services, this change was missed. Without it, if the systemd units get written, and none of them have changed, the status will never get set to active. The status getting set to active was intended to be done in the `restart_all_autopkgtest_web_services` function, however, if the files haven't changed, this won't happen. This commit amends the issue by the status to active at the end of the `set_up_systemd_units` function. If the units have changed, it'll be set back to maintenance in the `restart_all_autopkgtest_web_services` function, so there won't be any false "active" statuses.
cleanup-ppa-containers-small-bugfix	2024-06-17 14:50:13 UTC 2024-06-17	fix: cleanup-ppa-containers: datetime type comparison bugfix Author: Tim Andersson Author Date: 2024-06-17 14:50:13 UTC fix: cleanup-ppa-containers: datetime type comparison bugfix There was a small bug which got missed in the initial MP of this script. This commit amends the bug by not converting the datetime object to a timestamp prior to comparison with another datetime object.
lxd-armhf-kill-openstack-server-traceback	2024-06-14 13:02:41 UTC 2024-06-14	fix: worker: don't try to kill openstack servers on the lxd-worker Author: Tim Andersson Author Date: 2024-06-14 12:27:47 UTC fix: worker: don't try to kill openstack servers on the lxd-worker This commit fixes a bug which was causing the following traceback on the lxd worker: Traceback (most recent call last): File "/home/ubuntu/autopkgtest-cloud/worker/worker", line 1716, in <module> main() File "/home/ubuntu/autopkgtest-cloud/worker/worker", line 1709, in main queue.wait() File "/usr/lib/python3/dist-packages/amqplib/client_0_8/abstract_channel.py", line 97, in wait return self.dispatch_method(method_sig, args, content) File "/usr/lib/python3/dist-packages/amqplib/client_0_8/abstract_channel.py", line 117, in dispatch_me> return amqp_method(self, args, content) File "/usr/lib/python3/dist-packages/amqplib/client_0_8/channel.py", line 2060, in _basic_deliver func(msg) File "/home/ubuntu/autopkgtest-cloud/worker/worker", line 1360, in request kill_openstack_server(test_uuid) File "/home/ubuntu/autopkgtest-cloud/worker/worker", line 647, in kill_openstack_server auth_url=os.environ["OS_AUTH_URL"], File "/usr/lib/python3.8/os.py", line 675, in __getitem__ raise KeyError(key) from None KeyError: 'OS_AUTH_URL' This is fixed by checking to see if the OS_IDENTITY_API_VERSION env variable is present - if it's not, we're on the lxd worker. In this case, the kill_openstack_server function does nothing. This commit also moves some logging surrounding the kill_openstack_server function into the function itself, to avoid duplication of code.
user-specific-page	2024-06-13 15:54:18 UTC 2024-06-13	refactor: web: make browse-results.html use the results_table_core macro Author: Tim Andersson Author Date: 2024-04-17 17:10:32 UTC refactor: web: make browse-results.html use the results_table_core macro
tim-stats	2024-06-13 08:53:39 UTC 2024-06-13	check for armhf results with log message Author: Tim Andersson Author Date: 2024-06-13 08:53:39 UTC check for armhf results with log message
publish-db-fix	2024-06-12 16:34:25 UTC 2024-06-12	fix: web: use ftpmaster.internal for publish-db Author: Tim Andersson Author Date: 2024-06-12 16:15:39 UTC fix: web: use ftpmaster.internal for publish-db This commit modifies the publish-db script to use ftpmaster.internal instead of archive.ubuntu.com. ftpmaster.internal is obviously a lot faster and should help improve the speed of the publish-db script. The script was recently failing hitting various urls at archive.ubuntu.com, with a ConnectionResetError. This commit also adds that exception to the try except block which attempts to download the Sources.gz file for a component/pocket/release combination, just to stop the script from completely failing if this happens again. As mentioned, the script was failing completely, meaning the public database wasn't getting updated at all.
cleanup_old_ppa_containers	2024-06-12 13:43:18 UTC 2024-06-12	feat: cloud: add script for cleaning up old ppa containers in swift Author: Tim Andersson Author Date: 2023-08-03 10:22:19 UTC feat: cloud: add script for cleaning up old ppa containers in swift The swift database is currently a bit overloaded with lots of results from people testing PPA's, with some results being from a very long time ago. This is obviously less than ideal and after some recent discussion 26 weeks was decided to be an appropriate amount of time for a max age for PPA results. This commit adds a script, which iterates through all of the containers in swift, skips the distro results, the db backups container, and any upstream containers, and removes all ppa results older than the specified time. This script should be run when we open up a new series, and thus this step has been added to the section in `docs/administration.rst` regarding opening up a new series. Since the duration between releases is roughly 26 weeks, I think this makes sense.
restart-worker-on-charm-update	2024-06-11 10:08:22 UTC 2024-06-11	fix: web: restart apache2 and autopkgtest-web.target on charm update Author: Tim Andersson Author Date: 2024-06-11 09:42:08 UTC fix: web: restart apache2 and autopkgtest-web.target on charm update This commit introduces a new mechanism in the `upgrade-charm` hook, which restarts the apache2.service and the autopkgtest-web.target systemd units. This will make the webpage take into effect any changes to the webcontrol code, and make any services that have changes to the code or service files have those changes take immediate effect. Prior to this commit, restarting apache2 and the autopkgtest-web target was something that was done manually by an admin. This can be easily overlooked and should be automated. To implement this fix, the `upgrade-charm` [1] hook is utilised. This is a hook that runs every time a unit is undergoing an upgrade. It is customisable, and one can add any number of steps to run through in the event of a charm upgrade. For this purpose, the commit adds two calls: `systemctl restart apache2.service` `systemctl restart autopkgtest-web.target` to the hook. `apache2.service` is not a part of autopkgtest-web.target. [1] https://juju.is/docs/sdk/upgrade-charm-event
improve-handling-cache-amqp-failures	2024-06-11 08:09:36 UTC 2024-06-11	fix: web: make send_amqp_request function have a timeout Author: Tim Andersson Author Date: 2024-01-17 11:26:38 UTC fix: web: make send_amqp_request function have a timeout We were recently having some issues with production, where requesting a test via the webpage would result in the page eventually telling the user the server has timed out. This was because the send_amqp_request function was failing, but the library used to send the request has no internal timeout. send_amqp_request was failing because the rabbitmq-server ran out of memory. This commit amends the above issue by adding a timeout for the send_amqp_request function in the form of an signal handler, via the `signal` python module. The other issue, UX wise, is when rabbitmq is mid-restart, and the user is given this, unhelpful, error: ``` local variable `msg` referenced before assignment ``` The try/except block used to catch the TimeoutException also catches the UnboundLocalError exception. In the case rabbitmq is mid-restart, or completely down, this block now catches the issue. This commit also introduces a new exception class, QueueDead. This exception is used to give the user a more helpful error message, based on the two issues this commit catches, and overall, just improve the UX. The message from QueueDead is prepended with our generic "A server error has occurred" message, which includes details of how to contact members of the QA team.
preserve-set-correct-content-encoding	2024-06-08 12:18:28 UTC 2024-06-08	cloud-worker: feat: script for setting correct content encoding for logfiles Author: Tim Andersson Author Date: 2023-12-05 16:28:19 UTC cloud-worker: feat: script for setting correct content encoding for logfiles There was an issue when migrating our swift storage where the objects from our old swift storage were copied successfully but had an incorrect content-encoding for the logfiles, which meant users had to download the logs and couldn't view them in their browser. This script added in this commit sets the correct content encoding for all logfiles in all autopkgtest-$release containers. So it only does so for the distro related tests, no PPA's, no upstream tests, but this is intentional - it takes so long to run, and distro test results are a higher priority. The script, hopefully, will never be required again, but I'm preserving it here, just in case.
rabbitmq-cleanup-fix	2024-06-08 10:09:39 UTC 2024-06-08	fix: mojo: Fix deployment of rabbitmq cleanup script Author: Tim Andersson Author Date: 2024-04-02 09:03:28 UTC fix: mojo: Fix deployment of rabbitmq cleanup script Prior to this commit, the rabbitmq cleanup script wasn't getting copied over properly to the rabbitmq unit, causing the script to fail when rabbitmq required restarting. This commit creates a file for the rabbitmq cleanup script, in the autopkgtest-cloud repo, instead of just in the deployment script, as well as the necessary service files. The script now gets deployed as intended and functions as intended. This is a temporary workaround still until we get to the bottom of the rabbitmq restarts.
restart-autopkgtest-rgn-arch-services-on-worker-conf-file-change	2024-06-07 13:23:00 UTC 2024-06-07	experimental and WIP: restart autopkgtest (specific) services on worker conf ... Author: Tim Andersson Author Date: 2024-06-07 13:23:00 UTC experimental and WIP: restart autopkgtest (specific) services on worker conf file changes
restart-autopkgtest-services-on-worker-conf-file-change	2024-06-07 12:38:44 UTC 2024-06-07	fix: worker: restart autopkgtest.target when worker code has changed Author: Tim Andersson Author Date: 2024-06-06 10:23:31 UTC fix: worker: restart autopkgtest.target when worker code has changed We've, up until this point, had the finnicky issue of requiring the autopkgtest admins to manually restart the autopkgtest.target systemd target upon a charm update. This is problematic as it's something we could easily miss/forget, and also requires us to discuss whether a restart of the services is necessary when we're deciding whether or not to update the charms. This commit utilises the `any_file_changed` function from `charms.reactive.helpers`. The reactive part of the cloud worker charm now checks to see if there's been any changes to the worker code, and if so, sets the `autopkgtest.target-restart-needed` flag, restarting autopkgtest.target.
apache_logging_monitoring	2024-06-07 10:54:37 UTC 2024-06-07	feat: web: add apache monitoring and reporting to autopkgtest-web Author: Tim Andersson Author Date: 2023-08-07 15:26:04 UTC feat: web: add apache monitoring and reporting to autopkgtest-web This commit introduces a new script, `apache-request-monitoring`, which monitors the exit codes of http requests to our apache server. It captures the exit code and the count of said exit code in the last 5 minutes. This commit also adds the necessary service file, and adds the needed juju config options for the charm (the influx creds). The panel for this visualisation already exists on Grafana [1]. Once this script starts running in prod, we should start seeing some nice visualisations. [1] https://ubuntu-release.kpi.ubuntu.com/d/76Oe_0-Gz/autopkgtest?orgId=1&refresh=5m&from=1717736107186&to=1717757707187&var-instance=production&viewPanel=20
dont-let-cache-amqp-hang	2024-06-05 12:58:14 UTC 2024-06-05	fix: web: utilise TimeoutStartSec to stop cache-amqp from ever hanging Author: Tim Andersson Author Date: 2024-05-02 11:20:54 UTC fix: web: utilise TimeoutStartSec to stop cache-amqp from ever hanging cache-amqp would previously hang when there was a fault with the semaphore queues - a brittle mechanism which we intend to fix in the future if we decide to move back to a distributed system r.e. the autopkgtest-web units. When cache-amqp hangs, our KPI is no longer indicative of our queue size, which is very detrimental to our observability. The issue with hanging is sorted quite easily, via a systemctl restart of the cache-amqp service, so here we add a maximum runtime for cache-amqp of ten minutes. The service is triggered minutely and thus ten minutes is quite a conservative max runtime, but we don't want to start restarting the service prematurely either. There is the possibility of course that cache-amqp could hang for a few minutes and subsequently recover. We integrate this "maximum runtime" by utilising TimeoutStartSec [1], which monitors the time the ExecStart call is taking to complete. [1] https://www.freedesktop.org/software/systemd/man/latest/systemd.service.html#TimeoutStartSec=
fix-systemd-stop-restart-losing-jobs	2024-06-04 15:23:57 UTC 2024-06-04	fix: worker: actually save messages when service stopped by systemd Author: Tim Andersson Author Date: 2024-06-04 10:26:36 UTC fix: worker: actually save messages when service stopped by systemd Previously, we had an iteration of this functionality which instead utilised checking the exit code of the autopkgtest subprocess for a -15 code, utilising the mechanism detailed in [1]. This was brittle - in the case the restart was executed at the time before the VM for the test was in BUILD state, the autopkgtest subprocess would exit with code 1 and the test request would be lost. This commit amends the issue by instead utilising the signal handlers in the worker code. This commit also re-introduces the documentation lost by the revert commit prior to this one. Fixes LP: #2067714 [1] https://docs.python.org/3/library/subprocess.html#subprocess.CompletedProcess.returncode
revert-innappropriate-fail-strings	2024-06-04 07:55:48 UTC 2024-06-04	Revert "fix: worker: add fail strings for systemd failures and postfix failures" Author: Tim Andersson Author Date: 2024-06-04 07:55:48 UTC Revert "fix: worker: add fail strings for systemd failures and postfix failures" This reverts commit 3a33ae7e20701d36f2dbb680a7875d6b9ded45b5.
grafana-agent-setup	2024-05-23 07:43:49 UTC 2024-05-23	asdf needs a lil testing Author: Tim Andersson Author Date: 2024-05-23 07:43:49 UTC asdf needs a lil testing
worker-dont-remove-queue-item-systemctl-restart	2024-05-21 15:30:49 UTC 2024-05-21	fix: worker: don't ack message if worker was killed with USR1 (code -15) Author: Tim Andersson Author Date: 2024-05-02 16:43:58 UTC fix: worker: don't ack message if worker was killed with USR1 (code -15) With the recent introduction [1] of the easing of killing running tests - we hit an unforeseen issue! systemctl restarting the worker service currently falls into the block of logic that removes the openstack server AND removes the test request from the queue - oh no! This commit amends the issue by causing both systemctl stop and systemctl restart to be caught by the worker code, caught by our signal handlers, and in both cases, the amqp message to be explicitly requeued. This commit also amends the docs w.r.t. killing running tests. A different code must be specified to the kill command. This commit fixes bug LP: #2064582 For information about how python's subprocess module inherits exit codes from systemd signals, see [2]. [1] https://git.launchpad.net/autopkgtest-cloud/commit/?id=d0e3b2ddb3ac5fe4d62f879d2e1c9ca578797120 [2] https://docs.python.org/3/library/subprocess.html#subprocess.CompletedProcess.returncode
copy-security-group-fix	2024-05-17 07:42:41 UTC 2024-05-17	fix: cloud: autopkgtest@.service always copy default security group Author: Tim Andersson Author Date: 2024-05-17 07:38:19 UTC fix: cloud: autopkgtest@.service always copy default security group This commit changes the service file for our main autopkgtest service. Prior to this commit the copy-security-group script would be copying a security group based on the name of the service i.e. autopkgtest@lcy02... This change is committed in the hopes that our security group usage will be more robust if we continuously copy only from the default group. It'll fail if the default group doesn't exist. But this is probably a good thing. We had an issue where our security groups seemed to start having the wrong rules out of nowhere. An assumption could be that one security group creation partially failed, without having the correct rules, causing a cascading effect where subsequent secgroups inherited the borked rules.
kill-server-on-retries	2024-05-16 07:39:05 UTC 2024-05-16	fix: worker: always kill openstack server on retries Author: Tim Andersson Author Date: 2024-05-16 07:39:05 UTC fix: worker: always kill openstack server on retries
remove-dead-lxd-bos02	2024-05-13 14:11:56 UTC 2024-05-13	service-bundle: remove lxd-armhf 2,5,7 (bos02) Author: Tim Andersson Author Date: 2024-05-13 14:11:56 UTC service-bundle: remove lxd-armhf 2,5,7 (bos02)
die-roll-mechanism-stable-vs-devel	2024-05-13 08:00:25 UTC 2024-05-13	feat: cloud: add die roll mechanism for stable vs devel releases Author: Tim Andersson Author Date: 2024-05-10 11:51:47 UTC feat: cloud: add die roll mechanism for stable vs devel releases This mechanism also handles the case where the worker unit has an outdated version of python3-distro-info.
service-bundle-n-workers-update	2024-05-10 14:55:52 UTC 2024-05-10	service-bundle: align n-workers with values in prod Author: Tim Andersson Author Date: 2024-05-10 14:55:52 UTC service-bundle: align n-workers with values in prod
admin-page-no-fail-when-empty	2024-05-09 15:02:23 UTC 2024-05-09	fix: web: no traceback on empty admin page Author: Tim Andersson Author Date: 2024-05-09 10:35:30 UTC fix: web: no traceback on empty admin page If the db query in the admin page heuristic returns no results, the admin page traces back as the heuristic attempts to multiply a NoneType. This commit fixes the issue by first checking if duration_avg is not None before attempting the aforementioned multiplication.
web-charm-config-dir-fix	2024-05-09 13:37:23 UTC 2024-05-09	fix: web: make sure to create `.config/autopkgtest-web` directory in reactive... Author: Tim Andersson Author Date: 2024-05-09 13:36:18 UTC fix: web: make sure to create `.config/autopkgtest-web` directory in reactive charm
auto-queue-cleanup	2024-05-08 14:02:48 UTC 2024-05-08	feat: web: add queue-cleaner script Author: Tim Andersson Author Date: 2024-04-23 10:48:35 UTC feat: web: add queue-cleaner script This script removes unnecessary items from the queue by checking for items in the queue that have triggers that aren't in the proposed or release pockets. It also checks the queues for any duplicate items - after a specific item has been found, if another queue item has the same parameters, it is removed from the queue. This script runs every 15 minutes, and should help our throughput by not having unnecessary queue items. This commit also adds a flock call to the cache-amqp and queue-cleaner services to ensure that these services never run at the same time as to avoid any issues with parallel reads of the queues.
stop-tests-from-webpage	2024-05-02 16:37:02 UTC 2024-05-02	feat: cloud&web: add option to stop test from webpage Author: Tim Andersson Author Date: 2024-04-23 15:02:54 UTC feat: cloud&web: add option to stop test from webpage This commit introduces the functionality of being able to kill a currently running test from the autopkgtest webpage. Test-killer It introduces a new script, test-killer, which runs on the cloud worker units as a systemd service. test-killer listens to requests via amqp on the "tests-to-kill" exchange. Test uuid's are part of the message sent on this exchange to test-killer, and test-killer then kills the test using the test uuid. The initial message in the test-killer queue will look as such: { "uuid": "b864593b-82e2-424e-bfe7-f37748dbd047", "not-running-on": [], } The "not-running-on" list gets appended when a worker unit checks for the test with the given uuid and the test isn't present on that specific worker unit. test-killer appends the hostname of the current worker unit to this list. When the length of the "not-running-on" list is equal to the number of worker units, the message is removed from the queue if the uuid is not found in queues.json. In this case we assume the test has finished before we've had a chance to kill it. In this way, you can simply pass test-killer a uuid, and via amqp it'll check for the test on every worker unit. web changes The running page now displays a link under each running job (for admins only) which redirects to a new app under webcontrol - test-manager. test-manager has only one endpoint, similar to request/app.py. This endpoint is only available to a select few admins. This list of admins is now a config option for the charm (admin-nicks). This is in the service-bundle, with a sensible default set. This endpoint can be passed a uuid (uuid=$uuid), which then submits that uuid to the tests-to-kill exchange. test-manager first checks that the uuid is present in running.json, however, as to avoid wasting resources on killing a test that isn't already running. If the given uuid is found in running.json, that uuid is sent via amqp to the test-killer services on the various worker units, where the test is then killed.
upgrade-charm-to-docs	2024-05-02 14:22:20 UTC 2024-05-02	docs: remove mention of new dependencies requiring a unit replacement Author: Tim Andersson Author Date: 2024-03-28 14:39:04 UTC docs: remove mention of new dependencies requiring a unit replacement
filter-amqp-all-queues-option	2024-04-30 13:16:33 UTC 2024-04-30	feat: cloud: add option to clean regex from all queues in filter-amqp Author: Tim Andersson Author Date: 2024-03-01 11:57:42 UTC feat: cloud: add option to clean regex from all queues in filter-amqp If you pass all instead of the queue name, filter-amqp will remove test requests for all queues matching the regex. This commit also renames --all to --all-items-in-queue, as to avoid confusion when running filter-amqp with the queue name set to all.
fix-cache-amqp-creds	2024-04-30 10:57:09 UTC 2024-04-30	fix: web: fix cache-amqp incorrectly parsing private jobs Author: Tim Andersson Author Date: 2024-04-30 09:04:22 UTC fix: web: fix cache-amqp incorrectly parsing private jobs Some private jobs were recently queued without the newline character present in the test request string. Due to the try-except we previously had here, we would fall back to params={}. This was problematic, as for private jobs we rely on checking the key value pairs of the test request message to accurately denote them as private jobs. This commit marks all test requests that are in the incorrect format as "malformed request" in queued.json.
stop-looping-fix	2024-04-29 16:08:21 UTC 2024-04-29	fix: worker: also fake up files in the case of unidentified testbed failure Author: Tim Andersson Author Date: 2024-04-29 16:08:21 UTC fix: worker: also fake up files in the case of unidentified testbed failure
seed-new-release-update-auth-version	2024-04-26 19:19:21 UTC 2024-04-26	fix: cloud: update swift auth version for seed-new-release Author: Tim Andersson Author Date: 2024-04-26 19:19:21 UTC fix: cloud: update swift auth version for seed-new-release The auth version on the "new" bastion is 3.0.
uuid-db-column-unique-constraint	2024-04-26 13:34:15 UTC 2024-04-26	fix: web: add UNIQUE constraint to uuid column creation in helpers/utils.py `... Author: Tim Andersson Author Date: 2024-04-26 13:33:33 UTC fix: web: add UNIQUE constraint to uuid column creation in helpers/utils.py `init_db` Whilst this doesn't fix the issue of two "write requests" going into the sqlite-writer queue, this commit would still mean we get no duplicate results. Even in the case of two duplicate queue message, this commit would just ensure that the original entry is just replaced with the duplicate entry.
no-double-download-results	2024-04-26 13:34:15 UTC 2024-04-26	fix: web: add UNIQUE constraint to uuid column creation in helpers/utils.py `... Author: Tim Andersson Author Date: 2024-04-26 13:33:33 UTC fix: web: add UNIQUE constraint to uuid column creation in helpers/utils.py `init_db` Whilst this doesn't fix the issue of two "write requests" going into the sqlite-writer queue, this commit would still mean we get no duplicate results. Even in the case of two duplicate queue message, this commit would just ensure that the original entry is just replaced with the duplicate entry.
fix-killing-tests-api-version-parsing	2024-04-25 09:30:05 UTC 2024-04-25	fix: worker: fix api version check for datacentres where this isn't explicitl... Author: Tim Andersson Author Date: 2024-04-25 08:52:40 UTC fix: worker: fix api version check for datacentres where this isn't explicitly defined Not a critical bug, but killing any tests on datacentres without the api version explicitly defined wasn't killing the test itself in a structured manner, but just killing the worker service. Fixes bug LP: #2063429
three-tmpfails-no-looping-please	2024-04-24 15:26:53 UTC 2024-04-24	fix: worker: Never, ever let tests permanently loop Author: Tim Andersson Author Date: 2024-04-24 13:09:50 UTC fix: worker: Never, ever let tests permanently loop This commit completely removes the mechanism in which a worker is assumed to be "broken" in some way. This mechanism has in the past on countless occasions caused tests to permanently loop, and we've decided to kill it with fire.
temp-disable-content-length	2024-04-24 15:09:45 UTC 2024-04-24	fix: web: disable content-length header for static files Author: Tim Andersson Author Date: 2024-04-24 13:13:46 UTC fix: web: disable content-length header for static files britney still as of today utilises the content-length header if it is present in the autopkgtest.db download. However, after recent changes [1] to the apache2 package for focal, we've discovered that the content-length header is no longer 100% accurate. Because of this, we will disable the content-length header. [1] https://bugs.launchpad.net/ubuntu/+source/apache2/+bug/2061816
fix-tims-recent-docs	2024-04-23 15:11:28 UTC 2024-04-23	docs: fix missing lines inbetween code-block and code for "Resizing Volumes" ... Author: Tim Andersson Author Date: 2024-04-23 15:11:28 UTC docs: fix missing lines inbetween code-block and code for "Resizing Volumes" and "Killing Running Tests" sections
align-with-prod	2024-04-23 08:16:34 UTC 2024-04-23	service-bundle: align with service bundle in prod Author: Tim Andersson Author Date: 2024-04-23 08:12:03 UTC service-bundle: align with service bundle in prod
make-killing-tests-less-painful	2024-04-22 13:26:41 UTC 2024-04-22	docs: add section on killing a currently running test Author: Tim Andersson Author Date: 2024-04-22 13:06:31 UTC docs: add section on killing a currently running test
worker-upstream-percentage	2024-04-22 10:58:54 UTC 2024-04-22	feat: cloud: add upstream percentage as juju config option Author: Tim Andersson Author Date: 2024-04-15 11:20:31 UTC feat: cloud: add upstream percentage as juju config option This was a feature request from Brian Murray. Using this new feature, we can, on-the-fly, change the percentage of jobs that will willingly take upstream tests. This can be useful in situations where we'd like to prioritise distro tests or prioritise upstream tests. To modify, on the fly: juju config autopkgtest-$type-worker worker-upstream-percentage="$perc" Where $type is [cloud \| lxd] and $perc is an integer between 1 and 100.
more-exceptions-worker-put-object	2024-04-22 09:19:16 UTC 2024-04-22	fix: worker: Catch all exceptions in the try-except in swiftclient put_object... Author: Tim Andersson Author Date: 2024-04-22 08:48:40 UTC fix: worker: Catch all exceptions in the try-except in swiftclient put_object call We've been seeing recurring swift errors with the following traceback: ``` 7571 Apr 22 00:47:27 juju-7f2275-prod-proposed-migration-environment-3 sh[1767534]: File "/home/ubuntu/autopkgtest-cloud/worker/worker", line 1388, in request¬ 7572 Apr 22 00:47:27 juju-7f2275-prod-proposed-migration-environment-3 sh[1767534]: swiftclient.put_object(¬ ... 7586 Apr 22 00:47:27 juju-7f2275-prod-proposed-migration-environment-3 sh[1767534]: raise ConnectionError(err, request=request)¬ 7587 Apr 22 00:47:27 juju-7f2275-prod-proposed-migration-environment-3 sh[1767534]: requests.exceptions.ConnectionError: ('Connection aborted.', OSError("(32, 'EPIPE')"))¬ ``` This exception isn't currently caught by the except statement meaning put_object won't retry in this case.
fix-trailing-whitespace	2024-04-18 13:23:38 UTC 2024-04-18	docs: fix trailing whitespace in administration.rst Author: Tim Andersson Author Date: 2024-04-18 13:23:38 UTC docs: fix trailing whitespace in administration.rst
resize-docs	2024-04-18 09:08:40 UTC 2024-04-18	docs: add section on resizing ceph (tmp) partitions Author: Tim Andersson Author Date: 2024-04-15 17:00:18 UTC docs: add section on resizing ceph (tmp) partitions This commit adds a section detailing the process to increase the size of our cloud worker partitions, details when this may be a pertinent step to take, and also concisely details our findings related to the juju units recognising the increase in disk size.
fix-apache-transfer-encoding-for-autopkgtest-db	2024-04-16 14:45:03 UTC 2024-04-16	fix: web: fix no content-length header for static file endpoints Author: Tim Andersson Author Date: 2024-04-16 14:06:10 UTC fix: web: fix no content-length header for static file endpoints britney recently started having an issue with downloading the autopkgtest.db - suddenly the static endpoint had stopped returning the "Content-Length" header for these static get requests. This turned out to be a recent change from a security update for the apache2 package, see below: https://launchpad.net/ubuntu/+source/apache2/2.4.41-4ubuntu3.17 https://launchpadlibrarian.net/724225454/apache2_2.4.41-4ubuntu3.16_2.4.41-4ubuntu3.17.diff.gz It was verified in staging that apache2/2.4.41-4ubuntu3.17 was the problematic version - I tested apache2/2.4.41-4ubuntu3.16 and the Content-Length header was present when downloading the db. The header was no longer present because apache2 now serves static files by default with the "chunked" transfer-encoding, rather than the "identity" transfer-encoding (which includes the Content-Length header, Content-Length header is not compatible with a chunked transfer encoding, see [1]). This bug: https://bugs.launchpad.net/ubuntu/+source/apache2/+bug/2061816 Was opened, where a discussion with the package maintainer was had, and the conclusion was that this new behaviour by default was intended and is a necessary security patch. I was pointed to this thread: https://bz.apache.org/bugzilla/show_bug.cgi?id=68872 Which helpfully had a solution that I modified for our web app. All static files with the config present in this commit now get served with the "identity" transfer-encoding and have the appropriate Content-Length header. [1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Transfer-Encoding#directives
docs-add-queue-cleanup-section	2024-04-15 11:22:27 UTC 2024-04-15	docs: add section on how to do queue cleanup of obsoleted packages Author: Tim Andersson Author Date: 2024-03-19 12:33:25 UTC docs: add section on how to do queue cleanup of obsoleted packages
lp-2060213-fix	2024-04-11 13:02:02 UTC 2024-04-11	fix: cloud: fix unattended upgrades interrupting lxd tests Author: Tim Andersson Author Date: 2024-04-11 13:02:02 UTC fix: cloud: fix unattended upgrades interrupting lxd tests Prior to this commit, we were seeing instances of lxd armhf tests failing with the following: 954s E: Could not get lock /var/lib/dpkg/lock-frontend. It is held by process 763 (apt-get) 954s E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it? This is because unattended upgrades weren't disabled in our lxd armhf runners, whereas they are for all other architectures. This commit introduces a new setup script, setup-canonical-lxd.sh This script runs all the necessary setup commands for our lxd-armhf runners. This commit also moves the setup-commands from the service bundle for the autopkgtest-lxd-worker application into the setup-canonical-lxd.sh script. It replaces the setup-commands entry in the service bundle with the setup script, rather than the individual commands. Fixes bug LP: #2060213
service-message	2024-04-11 12:06:07 UTC 2024-04-11	feat: web: add the possibility of displaying a service message with juju config Author: Tim Andersson Author Date: 2024-04-10 15:43:30 UTC feat: web: add the possibility of displaying a service message with juju config This commit adds the possibility of displaying a short service message at the top of all pages. This is useful in situations where there's an issue/recent bug that'll affect all users. To add a service message, use: juju config autopkgtest-web important-service-message "my message". The message is displayed with black text on a yellow background. Displaying the service message requires a restart to apache2.
backup-worker-logs	2024-04-10 10:10:24 UTC 2024-04-10	feat: cloud-worker: Add service and timer for store-worker-logs Author: Tim Andersson Author Date: 2023-12-11 11:58:23 UTC feat: cloud-worker: Add service and timer for store-worker-logs
lxd-cleanup-srv-files	2024-04-10 10:05:01 UTC 2024-04-10	autopkgtest-cloud-worker: fix: remove old service files for lxd units Author: Tim Andersson Author Date: 2023-12-04 12:35:31 UTC autopkgtest-cloud-worker: fix: remove old service files for lxd units Whenever we introduce a new lxd remote and remove an old one, the systemd services for that specific remote are still leftover. This MP fixes that, and should make debugging armhf problems a bit easier.
amend-inserts-paramstyle-named	2024-04-09 09:30:09 UTC 2024-04-09	fix: web: use sqlite3.paramstyled = "named" where necessary Author: Tim Andersson Author Date: 2024-03-11 10:52:37 UTC fix: web: use sqlite3.paramstyled = "named" where necessary This commit makes other db write operations use the named paramstyle for sqlite, rather than the default, which is qmark. This just makes DB inserts a bit cleaner, rather than passing a tuple, using named parameters is much safer, and adds the bonus of being able to pass dictionaries to DB inserts. To quote waveform, the named paramstyle is the "One True Paramstyle"! https://peps.python.org/pep-0249/#paramstyle
web-fix-config-dir-permissions	2024-04-08 16:40:10 UTC 2024-04-08	fix: web: fix file permissions of .config folder Author: Tim Andersson Author Date: 2024-04-08 16:24:46 UTC fix: web: fix file permissions of .config folder prior to this only the root user had read and write access to this directory, which is obviously less than ideal. This is due to some inherent behaviour of pathlib, see [1]. Because of this behaviour, the .config directory had these permissions, but the .config/autopkgtest-web/ directory had correct permissions. This commit also explicitly sets the permissions for the directories, just as an added way of ensuring these directories will have the correct permissions. [1] https://docs.python.org/3/library/pathlib.html#pathlib.Path.mkdir
fix-allowed-requestor-teams	2024-04-08 16:07:40 UTC 2024-04-08	fix: web: rstrip the team name for allowed-requestor-teams Author: Tim Andersson Author Date: 2024-04-08 16:04:31 UTC fix: web: rstrip the team name for allowed-requestor-teams Prior to this commit, whitespace characters from the indented yaml juju config options were preserved causing LP api calls to " example-team" or something along those lines.
generate-charm-inventory-amendment	2024-04-08 09:37:35 UTC 2024-04-08	fix: generate-charm-inventory: ignore all merge commits Author: Tim Andersson Author Date: 2024-04-08 09:37:35 UTC fix: generate-charm-inventory: ignore all merge commits
worker-re-enable-retries	2024-04-08 08:42:23 UTC 2024-04-08	fix: worker: re-enable retries Author: Tim Andersson Author Date: 2024-04-08 08:42:23 UTC fix: worker: re-enable retries
admin-page-running-for-logtail-mismatch-heuristic	2024-04-05 16:28:16 UTC 2024-04-05	feat: web: add tests to admin page which have a mismatch between the logtail ... Author: Tim Andersson Author Date: 2024-04-05 16:25:22 UTC feat: web: add tests to admin page which have a mismatch between the logtail timestamp and running_for value Fixes bug LP: #2058463
04052024-bos02-armhf-update	2024-04-05 16:26:39 UTC 2024-04-05	service-bundle: modify IPs for bos02 armhf in line with recent changes Author: Tim Andersson Author Date: 2024-04-05 16:26:39 UTC service-bundle: modify IPs for bos02 armhf in line with recent changes
preserve-a-p-in-queued-tests	2024-04-05 15:39:31 UTC 2024-04-05	fix: web: show all-proposed for queued tests on results pages Author: Tim Andersson Author Date: 2024-04-05 09:28:56 UTC fix: web: show all-proposed for queued tests on results pages
fix-ci-failures	2024-04-05 10:51:02 UTC 2024-04-05	fix: web: add __init__.py to private_results/ to fix CI failures Author: Tim Andersson Author Date: 2024-04-05 10:51:02 UTC fix: web: add __init__.py to private_results/ to fix CI failures
fix-amqp-status-collector	2024-04-04 13:33:40 UTC 2024-04-04	fix: web: Add RuntimeDirectoryPreserve to amqp-status-collector.service Author: Tim Andersson Author Date: 2024-04-04 08:06:29 UTC fix: web: Add RuntimeDirectoryPreserve to amqp-status-collector.service This fixes the following issue: - When RabbitMQ is unresponsive, the amqp-status-collector script fails repeatedly - When amqp-status-collector isn't running, the /run/amqp-status-collector/ directory is removed, due to the behaviour of RuntimeDirectory - When that directory is removed, running.json also gets removed - Lots of other functionality in webcontrol depends upon this file. Such as requests, and browsing the /running or results pages This commit fixes the issue by adding RuntimeDirectoryPreserve=yes to amqp-status-collector.service. This flag, when set to restart or yes, causes the runtime directory to not be removed when the systemd unit is down.
allowed-teams-to-juju-config	2024-04-03 12:00:18 UTC 2024-04-03	feat: web: move ALLOWED_TEAMS to juju config instead of being hardcoded in re... Author: Tim Andersson Author Date: 2024-02-27 14:48:22 UTC feat: web: move ALLOWED_TEAMS to juju config instead of being hardcoded in request/submit.py
lxd-metrics-update	2024-04-03 11:12:44 UTC 2024-04-03	fix: cloud: only check intended ips for autopkgtest-lxd-worker metrics Author: Tim Andersson Author Date: 2024-02-07 15:38:30 UTC fix: cloud: only check intended ips for autopkgtest-lxd-worker metrics This commit adds a fix to the lxd metrics - we don't have a metric right now which checks the remotes specified in the service bundle if they aren't present in lxc remote list on the autopkgtest-lxd-worker. So this checks the list of intended remotes and makes note of any intended remotes which aren't in lxc remote list. We also currently report on remotes which aren't specified in the service bundle, but I think that's fine to leave in the metrics as it's indicative of issues. This commit also writes lxc-remotes.json to the ~ directory on the lxd worker, as the metrics script now uses this information to more accurately report the metrics.
cloud-worker-tmp-cleanup	2024-04-03 11:09:56 UTC 2024-04-03	feat: cloud: add worker tmp cleanup config Author: Tim Andersson Author Date: 2024-02-16 11:23:49 UTC feat: cloud: add worker tmp cleanup config tmp doesn't get automatically cleaned up periodically, only on boot. This is problematic as any edge case worker errors that cause the worker script to exit before cleaning up the logfile directory leaves the entire directory in tmp, leading to low disk space errors. This commit introduces a config file which removes files and directories in /tmp that haven't been modified in the last 30 days. It adds the tmp cleanup config to the service bundle common options for the autopkgtest-cloud-worker application. It also adds the config option to layer.yaml. And it also writes the cleanup config to /etc/tmpfiles.d/tmp.conf
pull-amqp-push-amqp	2024-04-03 09:21:50 UTC 2024-04-03	feat: cloud: add pull-amqp and push-amqp scripts Author: Tim Andersson Author Date: 2024-03-27 12:35:30 UTC feat: cloud: add pull-amqp and push-amqp scripts pull-amqp is a script that pulls all message from a queue. If the script is passed a regex, it will only pull the messages from the queue that match said regex. If the --empty arg is passed, it'll remove said messages. push-amqp is a script that simply pushes a message to a specified queue. The two scripts can be used in conjunction to easily shift specific queue messages from one queue to another, removing the need to craft a retry-autopkgtest-regressions command to shift tests between queues. push-amqp can also be used to push messages to other queues, like the sqlite-writer queue or the download-results queue. Fixes bug LP: #2059235
configparser-read-refactor	2024-04-02 17:23:40 UTC 2024-04-02	refactor: web&cloud: replace configparser.read with configparser.read_file or... Author: Tim Andersson Author Date: 2024-03-08 14:17:17 UTC refactor: web&cloud: replace configparser.read with configparser.read_file or read_string Also refactors all duplicate usage of configparser.read and shares common functions from helpers/utils.py, and amends unit tests in line with these changes
bump-workers-10-percent	2024-04-02 16:48:32 UTC 2024-04-02	service-bundle: bump all n-workers by 10% inline with recent quota changes Author: Tim Andersson Author Date: 2024-04-02 16:48:10 UTC service-bundle: bump all n-workers by 10% inline with recent quota changes
drop-all-all-proposed	2024-03-28 15:04:51 UTC 2024-03-28	fix: web: remove all all-proposed for noble Author: Tim Andersson Author Date: 2024-03-28 13:56:33 UTC fix: web: remove all all-proposed for noble
exit-code-14	2024-03-28 14:19:01 UTC 2024-03-28	fix: web: fix display of exit code 14 tests Author: Tim Andersson Author Date: 2024-03-28 14:19:01 UTC fix: web: fix display of exit code 14 tests We received a ping on IRC on 27/03/2024, about "exit code 14" being displayed on the results pages, which is somewhat confusing and non-descript. This mp fixes this by giving exit code 14 tests a human exit code that is descriptive w.r.t the autopkgtest man page.
login-button	2024-03-28 10:48:11 UTC 2024-03-28	feat: web: share login details between browse.cgi and request.cgi Author: Tim Andersson Author Date: 2024-03-27 15:08:28 UTC feat: web: share login details between browse.cgi and request.cgi This commit makes our flask app share the flask session between browse.cgi and request.cgi. Users, when not logged in, will now see a "Login" button on the navbar. Clicking this will log them in using the pre-existing mechanism in request.cgi.
postfix-systemd-fail-strings	2024-03-28 10:10:24 UTC 2024-03-28	fix: worker: add fail strings for systemd failures and postfix failures Author: Tim Andersson Author Date: 2024-03-28 10:10:24 UTC fix: worker: add fail strings for systemd failures and postfix failures These failures seem to cause forever looping tests, so we need to correctly recognise them as fail strings.

This repository contains Public information

Everyone can see this information.

Subscribers