twisted Unhandled Error when region can't reach upstream boot resource

Bug #1386914 reported by Jason Hobbs
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Critical
Julian Edwards

Bug Description

If the region controller can't reach its upstream boot resource URL to check for images to sync, an unhandled exception is raised and a big log stack trace gets dumped to maas-django.log. Instead, we should alert the admin that the boot resources URL can't be reached.

Here's the stack trace:
ERROR 2014-10-22 10:41:04,913 twisted Unhandled Error
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
    self.__bootstrap_inner()
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
--- <exception caught here> ---
  File "/usr/lib/python2.7/dist-packages/twisted/python/threadpool.py", line 191, in _worker
    result = context.call(ctx, function, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 118, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 81, in callWithContext
    return func(*args,**kw)
  File "/usr/lib/python2.7/dist-packages/maasserver/utils/async.py", line 153, in call_within_transaction
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/maasserver/bootsources.py", line 117, in _cache_boot_sources
    image_descriptions = download_all_image_descriptions(sources)
  File "/usr/lib/python2.7/dist-packages/provisioningserver/import_images/download_descriptions.py", line 187, in download_all_image_descriptions
    source['url'], keyring=source.get('keyring', None))
  File "/usr/lib/python2.7/dist-packages/provisioningserver/import_images/download_descriptions.py", line 178, in download_image_descriptions
    dumper.sync(reader, rpath)
  File "/usr/lib/python2.7/dist-packages/simplestreams/mirrors/__init__.py", line 82, in sync
    content, payload = reader.read_json(path)
  File "/usr/lib/python2.7/dist-packages/simplestreams/mirrors/__init__.py", line 39, in read_json
    raw = self.source(path).read().decode('utf-8')
  File "/usr/lib/python2.7/dist-packages/simplestreams/contentsource.py", line 143, in read
    self.open()
  File "/usr/lib/python2.7/dist-packages/simplestreams/contentsource.py", line 139, in open
    self.fd = self._open()
  File "/usr/lib/python2.7/dist-packages/simplestreams/contentsource.py", line 127, in _open
    return opener(*oargs, offset=self.offset)
  File "/usr/lib/python2.7/dist-packages/simplestreams/contentsource.py", line 302, in __init__
    self.req = requests.get(url, stream=True, auth=auth, headers=headers)
  File "/usr/lib/python2.7/dist-packages/requests/api.py", line 55, in get
    return request('get', url, **kwargs)
  File "/usr/lib/python2.7/dist-packages/requests/api.py", line 44, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 455, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 558, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python2.7/dist-packages/requests/adapters.py", line 378, in send
    raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='192.168.201.2', port=80): Max retries exceeded with url: /MAAS/images-stream/streams/v1/index.json (Caused by <class 'socket.error'>: [Errno 110] Connection timed out)

Related branches

Revision history for this message
Julian Edwards (julian-edwards) wrote :

Crash - so critical. Looks like a potential release blocker to me as well as this is an easy mistake to make.

Changed in maas:
status: New → Triaged
importance: Undecided → Critical
milestone: none → 1.7.0
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Is any unhandled twisted error a crash? What's the impact of a crash like this in the region controller?

Revision history for this message
Julian Edwards (julian-edwards) wrote :

Twisted catches stuff internally IIRC but at the cost of killing whatever service had the error. I don't know if Gavin put something in to catch that in the region, though.

Revision history for this message
Christian Reis (kiko) wrote :

Interesting; I wonder if we see this, for instance, when restricted behind a proxy as per bug 1384464.

Changed in maas:
assignee: nobody → Julian Edwards (julian-edwards)
status: Triaged → In Progress
Revision history for this message
Julian Edwards (julian-edwards) wrote :

The user really needs to be notified of this situation, so I'm adding a persistent component error for failing imports.

Revision history for this message
Christian Reis (kiko) wrote :

This should also check whether we were able to download bootloader (and other boot resources like the ARM DTB). It would be nice to do this in centralized infrastructure.

Changed in maas:
milestone: 1.7.0 → 1.7.1
Revision history for this message
Blake Rouse (blake-rouse) wrote :

There is twisted error handling in the CacheService but not in the cache_boot_sources function, moving the error handling into that function will handle this error correct.

Revision history for this message
Julian Edwards (julian-edwards) wrote : Re: [Bug 1386914] Re: twisted Unhandled Error when region can't reach upstream boot resource

Right, I need a deeper discussion with Blake as to how all the parts of the
importer fit together so that I can understand what bits download from where
using what mechanisms, and when it all happens.

Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.