Add service w/ watchdog to handle usd-importer failures

Bug #1838954 reported by Christian Ehrhardt 
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
git-ubuntu
Fix Released
High
Bryce Harrington

Bug Description

Various error situations can cause the importer to hang (see LP: #1745211) or fail (this bug). Bug LP: #1765219 dealt with some system failure conditions, such as out-of-disk-space, but there could be other various situations causing the importer to stop.

Currently, these failures are dealt with by manual monitoring and restarting the importer script, but a more robust (if a bit brute-force) solution would be to invoke the script via a systemd service, with a watchdog to detect if the importer's main loop is operational and if not to restart the service.

Error detection is currently handled manually as well, by visual inspection of the screen session for stack traces or evidence of hangs. With the introduction of a service daemon, the script output would be logged to a (logrotate'd) file. A new error detection/reporting process would need to be added to email the relevant log snippet to the administration mailing list.

Installation of the service daemon script will eventually need to be done by the snapd installation process, but initially we'll just let the script be manually installed by the system admins.

[Original Report]
Hi,
I found the importer down with the following message:

Examining publishes in debian since 2019-08-04 04:39:52
Traceback (most recent call last):
  File "/snap/git-ubuntu/456/bin/import-source-packages.py", line 361, in <module>
    cli_main()
  File "/snap/git-ubuntu/456/bin/import-source-packages.py", line 357, in cli_main
    only_request_new_imports_once=args.only_request_new_imports_once,
  File "/snap/git-ubuntu/456/bin/import-source-packages.py", line 244, in main
    request_new_imports(pkgnames_to_import, num_days_ago, service_state)
  File "/snap/git-ubuntu/456/bin/import-source-packages.py", line 94, in request_new_imports
    dist = launchpad.distributions[dist_name]
  File "/snap/git-ubuntu/456/lib/python3.6/site-packages/lazr/restfulclient/resource.py", line 998, in __getitem__
    shim_resource._ensure_representation()
  File "/snap/git-ubuntu/456/lib/python3.6/site-packages/lazr/restfulclient/resource.py", line 379, in _ensure_representation
    representation = self._root._browser.get(self._wadl_resource)
  File "/snap/git-ubuntu/456/lib/python3.6/site-packages/lazr/restfulclient/_browser.py", line 448, in get
    response, content = self._request(url, extra_headers=headers)
  File "/snap/git-ubuntu/456/lib/python3.6/site-packages/lazr/restfulclient/_browser.py", line 399, in _request
    str(url), method=method, body=data, headers=headers)
  File "/snap/git-ubuntu/456/lib/python3.6/site-packages/lazr/restfulclient/_browser.py", line 369, in _request_and_retry
    url, method=method, body=body, headers=headers)
  File "/snap/git-ubuntu/456/lib/python3.6/site-packages/httplib2/__init__.py", line 1911, in request
    cachekey,
  File "/snap/git-ubuntu/456/lib/python3.6/site-packages/launchpadlib/launchpad.py", line 136, in _request
    LaunchpadOAuthAwareHttp, self)._request(*args)
  File "/snap/git-ubuntu/456/lib/python3.6/site-packages/lazr/restfulclient/_browser.py", line 195, in _request
    redirections, cachekey)
  File "/snap/git-ubuntu/456/lib/python3.6/site-packages/httplib2/__init__.py", line 1618, in _request
    conn, request_uri, method, body, headers
  File "/snap/git-ubuntu/456/lib/python3.6/site-packages/httplib2/__init__.py", line 1556, in _conn_request
    response = conn.getresponse()
  File "/snap/git-ubuntu/456/usr/lib/python3.6/http/client.py", line 1331, in getresponse
    response.begin()
  File "/snap/git-ubuntu/456/usr/lib/python3.6/http/client.py", line 297, in begin
    version, status, reason = self._read_status()
  File "/snap/git-ubuntu/456/usr/lib/python3.6/http/client.py", line 258, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/snap/git-ubuntu/456/usr/lib/python3.6/socket.py", line 586, in readinto
    return self._sock.recv_into(b)
  File "/snap/git-ubuntu/456/usr/lib/python3.6/ssl.py", line 1009, in recv_into
      return self.read(nbytes, buffer)
  File "/snap/git-ubuntu/456/usr/lib/python3.6/ssl.py", line 871, in read
    return self._sslobj.read(len, buffer)
  File "/snap/git-ubuntu/456/usr/lib/python3.6/ssl.py", line 631, in read
    v = self._sslobj.read(len, buffer)
ConnectionResetError: [Errno 104] Connection reset by peer

P.S. I haven't found another impoirt tag bug with that signature, feel free to dup if there is one.

Tags: import

Related branches

tags: added: import
Bryce Harrington (bryce)
Changed in usd-importer:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Bryce Harrington (bryce)
summary: - importer failed with "Connection reset by peer"
+ Add service w/ watchdog to handle usd-importer failures
Bryce Harrington (bryce)
description: updated
Revision history for this message
Bryce Harrington (bryce) wrote :

Fixed in the following commit (and refined in a couple subsequent commits). Change is landed to production, and snaps updated.

commit 5d104bce612db56f3274c667ee7b008da5e68c18
Author: Bryce Harrington <email address hidden>
Date: Thu Oct 3 07:19:19 2019 -0700

    Implement a systemd watchdog daemon to run import-source-packages.py

    Git Ubuntu's package importing functionality is invoked via the
    import-source-packages.py script. Previously, this script would be
    manually started, and on error needed manual intervention.

    Instead, wrap the script in a systemd service that starts it up
    initially and restarts it on crash. A watchdog timer is used to detect
    if the script has hung, and restarts it after a suitable delay.

    Another service is added for sending emails when the service crashes,
    extracting status from the journal. Errors can also be reviewed using
    journalctl normally.

    By default, everything is configured to be installable in production,
    but configuration considerations are covered in documentation. There
    are no unit tests for this, however some testing/validation tips are
    identified in the documentation.

    LP: #1838954

Changed in usd-importer:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.