Merge autopkgtest-cloud:capture-and-proceeed-influxdb-client-error into autopkgtest-cloud:master

Proposed by Brian Murray
Status: Merged
Merged at revision: 75de188e3e32abffe983770297ccf05cd7a0795d
Proposed branch: autopkgtest-cloud:capture-and-proceeed-influxdb-client-error
Merge into: autopkgtest-cloud:master
Diff against target: 33 lines (+7/-1)
1 file modified
charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/worker/worker (+7/-1)
Reviewer Review Type Date Requested Status
Paride Legovini (community) Approve
Ubuntu Release Team Pending
Review via email: mp+426312@code.launchpad.net

Description of the change

The armhf runners fell over when they were no longer able to submit statistics to InfluxDB. While this isn't great we shouldn't stop running tests just because we can't submit statistics.

Jul 04 23:09:44 juju-4d1272-prod-proposed-migration-5 /home/ubuntu/autopkgtest-cloud/worker/worker[1711944]: INFO: Running /home/ubuntu/autopkgtest/runner/autopkgtest --output-dir /tmp/autopkgtest-work.hdf_lxai/out --timeout-copy=6000 --setup-commands sed -i "s/ports.ubuntu.com/ftpmaster.internal/; s/ubuntu-ports/ubuntu/" /etc/apt/sources.list `ls /etc/apt/sources.list.d/*.list 2>/dev/null || true`; ln -s /dev/null /etc/systemd/system/bluetooth.service; printf "http_proxy=http://squid.internal:3128\nhttps_proxy=http://squid.internal:3128\nno_proxy=127.0.0.1,127.0.1.1,localhost,localdomain,novalocal,internal,archive.ubuntu.com,ports.ubuntu.com,security.ubuntu.com,ddebs.ubuntu.com,changelogs.ubuntu.com,launchpad.net,10.24.0.0/24\n" >> /etc/environment --apt-pocket=proposed=src:init-system-helpers --apt-upgrade userv --timeout-short=300 --timeout-copy=20000 --timeout-build=20000 --env=ADT_TEST_TRIGGERS=init-system-helpers/1.64 -- lxd -r lxd-armhf-10.44.124.7 lxd-armhf-10.44.124.7:autopkgtest/ubuntu/kinetic/armhf
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 /home/ubuntu/autopkgtest-cloud/worker/worker[1711944]: INFO: autopkgtest exited with code 0
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: Traceback (most recent call last):
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: File "/home/ubuntu/autopkgtest-cloud/worker/worker", line 1102, in <module>
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: main()
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: File "/home/ubuntu/autopkgtest-cloud/worker/worker", line 1095, in main
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: queue.wait()
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: File "/usr/lib/python3/dist-packages/amqplib/client_0_8/abstract_channel.py", line 97, in wait
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: return self.dispatch_method(method_sig, args, content)
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: File "/usr/lib/python3/dist-packages/amqplib/client_0_8/abstract_channel.py", line 117, in dispatch_method
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: return amqp_method(self, args, content)
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: File "/usr/lib/python3/dist-packages/amqplib/client_0_8/channel.py", line 2060, in _basic_deliver
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: func(msg)
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: File "/home/ubuntu/autopkgtest-cloud/worker/worker", line 856, in request
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: submit_metric(architecture, code, pkgname, current_region, False, release)
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: File "/home/ubuntu/autopkgtest-cloud/worker/worker", line 169, in submit_metric
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: influx_client.write_points([point])
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: File "/usr/lib/python3/dist-packages/influxdb/client.py", line 486, in write_points
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: return self._write_points(points=points,
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: File "/usr/lib/python3/dist-packages/influxdb/client.py", line 547, in _write_points
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: self.write(
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: File "/usr/lib/python3/dist-packages/influxdb/client.py", line 321, in write
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: self.request(
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: File "/usr/lib/python3/dist-packages/influxdb/client.py", line 286, in request
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: raise InfluxDBClientError(response.content, response.status_code)
Jul 04 23:23:00 juju-4d1272-prod-proposed-migration-5 sh[1711944]: influxdb.exceptions.InfluxDBClientError: 400: {"error":"partial write: max-series-per-database limit exceeded: (1000000) dropped=1"}

To post a comment you must log in.
Revision history for this message
Brian Murray (brian-murray) wrote :

IS did increase the `max-series-per-database` setting so we shouldn't encounter this specific error any more, but it still seems wrong to me that all tests quit running because we can't write to Influx.

Revision history for this message
Paride Legovini (paride) wrote :

LGTM!

review: Approve
Revision history for this message
Iain Lane (laney) wrote :

To me this speaks to a need to have monitoring for the service.

It shouldn't quit, so this MP is right, thanks for that - but it _should_ raise an alert to the team.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
diff --git a/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/worker/worker b/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/worker/worker
index 2d2bd45..9081e13 100755
--- a/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/worker/worker
+++ b/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/worker/worker
@@ -29,6 +29,7 @@ import swiftclient
29import systemd.journal29import systemd.journal
3030
31from influxdb import InfluxDBClient31from influxdb import InfluxDBClient
32from influxdb.exceptions import InfluxDBClientError
32from urllib.error import HTTPError33from urllib.error import HTTPError
3334
34ALL_RELEASES = distro_info.UbuntuDistroInfo().get_all(result='object')35ALL_RELEASES = distro_info.UbuntuDistroInfo().get_all(result='object')
@@ -166,7 +167,11 @@ def submit_metric(architecture, code, pkgname, current_region, retry, release):
166 "series": release,167 "series": release,
167 },168 },
168 }169 }
169 influx_client.write_points([point])170 try:
171 influx_client.write_points([point])
172 except InfluxDBClientError as err:
173 logging.error("Write to InfluxDB failed: %s" % err)
174 return
170175
171176
172def getglob(d, glob, default=None):177def getglob(d, glob, default=None):
@@ -848,6 +853,7 @@ def request(msg):
848 logging.warning('Three fails in a row - considering this a failure rather than tmpfail')853 logging.warning('Three fails in a row - considering this a failure rather than tmpfail')
849 code = 4854 code = 4
850 else:855 else:
856 # 2022-07-05 what code is passed to submit_metric in this code path?
851 submit_metric(architecture, code, pkgname, current_region, False, release)857 submit_metric(architecture, code, pkgname, current_region, False, release)
852 logging.error('Three tmpfails in a row, aborting worker. Log follows:')858 logging.error('Three tmpfails in a row, aborting worker. Log follows:')
853 logging.error(log_contents(out_dir))859 logging.error(log_contents(out_dir))

Subscribers

People subscribed via source and target branches