Merge autopkgtest-cloud:capture-and-proceeed-influxdb-client-error into autopkgtest-cloud:master

Proposed by Brian Murray
Status: Merged
Merged at revision: 75de188e3e32abffe983770297ccf05cd7a0795d
Proposed branch: autopkgtest-cloud:capture-and-proceeed-influxdb-client-error
Merge into: autopkgtest-cloud:master
Diff against target: 33 lines (+7/-1)
1 file modified
charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/worker/worker (+7/-1)
Reviewer Review Type Date Requested Status
Paride Legovini (community) Approve
Ubuntu Release Team Pending
Review via email: mp+426312@code.launchpad.net

Description of the change

The armhf runners fell over when they were no longer able to submit statistics to InfluxDB. While this isn't great we shouldn't stop running tests just because we can't submit statistics.

Jul 04 23:09:44 juju-4d1272-prod-proposed-migration-5 /home/ubuntu/autopkgtest-cloud/worker/worker[1711944]: INFO: Running /home/ubuntu/autopkgtest/runner/autopkgtest --output-dir /tmp/autopkgtest-work.hdf_lxai/out --timeout-copy=6000 --setup-commands sed -i "s/ports.ubuntu.com/ftpmaster.internal/; s/ubuntu-ports/ubuntu/" /etc/apt/sources.list `ls /etc/apt/sources.list.d/*.list 2>/dev/null || true`; ln -s /dev/null /etc/systemd/system/bluetooth.service; printf "http_proxy=http://squid.internal:3128\nhttps_proxy=http://squid.internal:3128\nno_proxy=127.0.0.1,127.0.1.1,localhost,localdomain,novalocal,internal,archive.ubuntu.com,ports.ubuntu.com,security.ubuntu.com,ddebs.ubuntu.com,changelogs.ubuntu.com,launchpad.net,10.24.0.0/24\n" >> /etc/environment --apt-pocket=proposed=src:init-system-helpers --apt-upgrade userv --timeout-short=300 --timeout-copy=20000 --timeout-build=20000 --env=ADT_TEST_TRIGGERS=init-system-helpers/1.64 -- lxd -r lxd-armhf-10.44.124.7 lxd-armhf-10.44.124.7:autopkgtest/ubuntu/kinetic/armhf
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 /home/ubuntu/autopkgtest-cloud/worker/worker[1711944]: INFO: autopkgtest exited with code 0
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: Traceback (most recent call last):
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: File "/home/ubuntu/autopkgtest-cloud/worker/worker", line 1102, in <module>
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: main()
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: File "/home/ubuntu/autopkgtest-cloud/worker/worker", line 1095, in main
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: queue.wait()
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: File "/usr/lib/python3/dist-packages/amqplib/client_0_8/abstract_channel.py", line 97, in wait
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: return self.dispatch_method(method_sig, args, content)
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: File "/usr/lib/python3/dist-packages/amqplib/client_0_8/abstract_channel.py", line 117, in dispatch_method
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: return amqp_method(self, args, content)
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: File "/usr/lib/python3/dist-packages/amqplib/client_0_8/channel.py", line 2060, in _basic_deliver
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: func(msg)
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: File "/home/ubuntu/autopkgtest-cloud/worker/worker", line 856, in request
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: submit_metric(architecture, code, pkgname, current_region, False, release)
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: File "/home/ubuntu/autopkgtest-cloud/worker/worker", line 169, in submit_metric
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: influx_client.write_points([point])
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: File "/usr/lib/python3/dist-packages/influxdb/client.py", line 486, in write_points
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: return self._write_points(points=points,
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: File "/usr/lib/python3/dist-packages/influxdb/client.py", line 547, in _write_points
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: self.write(
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: File "/usr/lib/python3/dist-packages/influxdb/client.py", line 321, in write
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: self.request(
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: File "/usr/lib/python3/dist-packages/influxdb/client.py", line 286, in request
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: raise InfluxDBClientError(response.content, response.status_code)
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: influxdb.exceptions.InfluxDBClientError: 400: {"error":"partial write: max-series-per-database limit exceeded: (1000000) dropped=1"}

To post a comment you must log in.
Revision history for this message
Brian Murray (brian-murray) wrote :

IS did increase the `max-series-per-database` setting so we shouldn't encounter this specific error any more, but it still seems wrong to me that all tests quit running because we can't write to Influx.

Revision history for this message
Paride Legovini (paride) wrote :

LGTM!

review: Approve
Revision history for this message
Iain Lane (laney) wrote :

To me this speaks to a need to have monitoring for the service.

It shouldn't quit, so this MP is right, thanks for that - but it _should_ raise an alert to the team.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/worker/worker b/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/worker/worker
2index 2d2bd45..9081e13 100755
3--- a/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/worker/worker
4+++ b/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/worker/worker
5@@ -29,6 +29,7 @@ import swiftclient
6 import systemd.journal
7
8 from influxdb import InfluxDBClient
9+from influxdb.exceptions import InfluxDBClientError
10 from urllib.error import HTTPError
11
12 ALL_RELEASES = distro_info.UbuntuDistroInfo().get_all(result='object')
13@@ -166,7 +167,11 @@ def submit_metric(architecture, code, pkgname, current_region, retry, release):
14 "series": release,
15 },
16 }
17- influx_client.write_points([point])
18+ try:
19+ influx_client.write_points([point])
20+ except InfluxDBClientError as err:
21+ logging.error("Write to InfluxDB failed: %s" % err)
22+ return
23
24
25 def getglob(d, glob, default=None):
26@@ -848,6 +853,7 @@ def request(msg):
27 logging.warning('Three fails in a row - considering this a failure rather than tmpfail')
28 code = 4
29 else:
30+ # 2022-07-05 what code is passed to submit_metric in this code path?
31 submit_metric(architecture, code, pkgname, current_region, False, release)
32 logging.error('Three tmpfails in a row, aborting worker. Log follows:')
33 logging.error(log_contents(out_dir))

Subscribers

People subscribed via source and target branches