~hyask/autopkgtest-cloud:skia/slower_running_state

Last commit made on 2024-06-20
Get this branch:
git clone -b skia/slower_running_state https://git.launchpad.net/~hyask/autopkgtest-cloud
Only Skia can upload to this branch. If you are Skia please log in for upload directions.

Branch merges

Branch information

Name:
skia/slower_running_state
Repository:
lp:~hyask/autopkgtest-cloud

Recent commits

252573a... by Skia

feat: web: improve running page

* Self document that the information is refreshed every 30 seconds.
* Reorder the queue length section and the running packages one so that
  we very quickly have the info of the queue length, without needing to
  scroll down when too much packages are running.
* Some semantic improvements (h2 vs h3)

Succeeded
[SUCCEEDED] pre_commit:0 (build)
[SUCCEEDED] unit_tests:0 (build)
[SUCCEEDED] build_charms:0 (build)
13 of 3 results
f019281... by Skia

fix: worker: only send the current state every 30 seconds

This fixes a producer/consumer issue, leading to RabbitMQ consuming too
much memory and restarting every 2 hours when the infra is under high
load.

This boils down to this:
* Every worker sends its state every 10s to a RabbitMQ fanout
* This fanout distributes messages to one subscriber per web unit, in
  the `amqp-status-collector` service.
* This `amqp-status-collector` must then process every message to
  produce `running.json`, that is later used to generate the `/running`
  page.
* The performance of this `amqp-status-collector` script boils down
  to its `process_message` function, and then to the slow JSON parsing
  of Python.
* This function was benchmarked on the infra to run in about 25ms,
  meaning an average throughput of about 40 messages per seconds to
  consume the queues.
* With the currently deployed workers (110+22+22+22+22+50)*2+16*3=544,
  the producing part of that system can reach more than 50 messages per
  seconds.
* That means RabbitMQ will at some point start to accumulate messages
  in the queues, and finally get killed by the watchdog looking at its
  RAM consumption.

A quick analysis of the logs shows that basically nobody relies on
refreshing the `/running` page every 10 seconds, so a refresh rate of 30
seconds should definitely be enough for most use-cases.

c1e01d2... by Skia

fix: stats: use UUID instead of run_id to join tables

`run_id` is not guaranteed to be unique, leading to some issues when
joining in the stats. For example, these two perfectly valid jobs have
the same `run_id`:
https://autopkgtest.ubuntu.com/results/autopkgtest-oracular/oracular/ppc64el/p/python-kubernetes/20240608_182736_d78ad@/log.gz
https://autopkgtest.ubuntu.com/results/autopkgtest-oracular/oracular/armhf/p/python-kubernetes/20240608_182736_d78ad@/log.gz

d9a5d07... by Skia

Merge branch 'skia/stats_boot_attempts'

https://code.launchpad.net/~hyask/autopkgtest-cloud/+git/autopkgtest-cloud/+merge/467811

a546026... by Skia

stats: show basic stats on the database

Showing those when connecting to the database gives the analyst some
indsight on what data is available to analyze.

Succeeded
[SUCCEEDED] pre_commit:0 (build)
[SUCCEEDED] unit_tests:0 (build)
[SUCCEEDED] build_charms:0 (build)
13 of 3 results
850cdea... by Skia

stats: improve graph output

This special syntax shows a clear circle at each data point. This is
useful when some data is showing only once, thus doesn't have a line
linking two points.

Succeeded
[SUCCEEDED] pre_commit:0 (build)
[SUCCEEDED] unit_tests:0 (build)
[SUCCEEDED] build_charms:0 (build)
13 of 3 results
c908f76... by Skia

Move the stats scripts to 'dev-tools'

While they leverage the web database, they clearly don't belong to the
web charm. They feel more appropriate in `dev-tools`, at the root.

973213c... by Skia

feat: stats: add the 'Boot attempts distribution'

This new cell shows detailed boot attempts distribution, to better
understand how difficult it is to spawn a VM in a particular datacenter.

Succeeded
[SUCCEEDED] pre_commit:0 (build)
[SUCCEEDED] unit_tests:0 (build)
[SUCCEEDED] build_charms:0 (build)
13 of 3 results
939ad48... by Skia

Merge branch 'skia/log_viewer'

https://code.launchpad.net/~hyask/autopkgtest-cloud/+git/autopkgtest-cloud/+merge/467805

030fe6a... by Skia

feat: web: improve logs viewer

- better parse each test section:
  - preparing testbed
  - test run
  - test results
- icon is now the Ubuntu logo, without a request to Google

Succeeded
[SUCCEEDED] pre_commit:0 (build)
[SUCCEEDED] unit_tests:0 (build)
[SUCCEEDED] build_charms:0 (build)
13 of 3 results