[2.5] Cannot configure DHCP on a multi-node MAAS

Bug #1792031 reported by Andres Rodriguez
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Blake Rouse

Bug Description

On a multi-node MAAS I started seeing the following. I dont have an specific way to reproduce, but just keep seeing this from time to time.

Note that the UI shows, on the rack controller, shows a failure (dhcpd - timed out) when it fails, but a few seconds after, it fixes itself, another seconds later, it fails again. So this is intermitent.

2018-09-11 20:55:45 maasserver.dhcp: [info] Successfully configured DHCPv6 on rack controller 'ycqan6'.
2018-09-11 20:55:45 maasserver.rack_controller: [critical] Failed configuring DHCP on rack controller 'id:1'.
        Traceback (most recent call last):
          File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1442, in gotResult
            _inlineCallbacks(r, g, deferred)
          File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1432, in _inlineCallbacks
            deferred.errback()
          File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 500, in errback
            self._startRunCallbacks(fail)
          File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 567, in _startRunCallbacks
            self._runCallbacks()
        --- <exception caught here> ---
          File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 653, in _runCallbacks
            current.result = callback(current.result, *args, **kw)
          File "/usr/lib/python3/dist-packages/maasserver/rack_controller.py", line 256, in <lambda>
            d.addErrback(lambda f: f.trap(NoConnectionsAvailable))
          File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 359, in trap
            self.raiseException()
          File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 385, in raiseException
            raise self.value.with_traceback(self.tb)
          File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1386, in _inlineCallbacks
            result = g.send(result)
          File "/usr/lib/python3/dist-packages/maasserver/dhcp.py", line 835, in configure_dhcp
            raise ipv4_exc
          File "/usr/lib/python3/dist-packages/maasserver/dhcp.py", line 773, in configure_dhcp
            omapi_key=config.omapi_key)
        provisioningserver.rpc.exceptions.CannotConfigureDHCP: timed out

Tags: sprint track

Related branches

description: updated
Changed in maas:
milestone: none → 2.5.0beta2
importance: Undecided → Critical
status: New → Triaged
description: updated
description: updated
Revision history for this message
Andres Rodriguez (andreserl) wrote :
Download full text (4.0 KiB)

Restarting, saw this:

2018-09-11 21:02:19 maasserver.dhcp: [info] Successfully configured DHCPv6 on rack controller 'ycqan6'.
2018-09-11 21:02:19 maasserver.rack_controller: [critical] Failed configuring DHCP on rack controller 'id:1'.
        Traceback (most recent call last):
          File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1442, in gotResult
            _inlineCallbacks(r, g, deferred)
          File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1432, in _inlineCallbacks
            deferred.errback()
          File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 500, in errback
            self._startRunCallbacks(fail)
          File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 567, in _startRunCallbacks
            self._runCallbacks()
        --- <exception caught here> ---
          File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 653, in _runCallbacks
            current.result = callback(current.result, *args, **kw)
          File "/usr/lib/python3/dist-packages/maasserver/rack_controller.py", line 256, in <lambda>
            d.addErrback(lambda f: f.trap(NoConnectionsAvailable))
          File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 359, in trap
            self.raiseException()
          File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 385, in raiseException
            raise self.value.with_traceback(self.tb)
          File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1386, in _inlineCallbacks
            result = g.send(result)
          File "/usr/lib/python3/dist-packages/maasserver/dhcp.py", line 835, in configure_dhcp
            raise ipv4_exc
          File "/usr/lib/python3/dist-packages/maasserver/dhcp.py", line 773, in configure_dhcp
            omapi_key=config.omapi_key)
        provisioningserver.rpc.exceptions.CannotConfigureDHCP: timed out

2018-09-11 21:02:22 maasserver.dhcp: [critical] Error configuring DHCPv4 on rack controller 'ycqan6':
        Traceback (most recent call last):
        Failure: twisted.internet.defer.CancelledError:

2018-09-11 21:02:22 maasserver.dhcp: [info] Successfully configured DHCPv6 on rack controller 'ycqan6'.
2018-09-11 21:02:22 maasserver.rack_controller: [critical] Failed configuring DHCP on rack controller 'id:1'.
        Traceback (most recent call last):
          File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1442, in gotResult
            _inlineCallbacks(r, g, deferred)
          File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1432, in _inlineCallbacks
            deferred.errback()
          File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 500, in errback
            self._startRunCallbacks(fail)
          File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 567, in _startRunCallbacks
            self._runCallbacks()
        --- <exception caught here> ---
          File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 653, in _runCallbacks
            current.result = callback(current.result, *args, **kw)...

Read more...

Revision history for this message
Blake Rouse (blake-rouse) wrote :

This is not a critical error because its not preventing MAAS from working, MAAS is handling the error and continuing to work properly.

This is the 30 second timeout that handles issues where the DHCP service gets stuck because of the service monitor.

Changed in maas:
importance: Critical → High
Revision history for this message
Andres Rodriguez (andreserl) wrote :

This is a critical error. It prevents MAAS from working because DHCP is not working even when it is enabled.

Revision history for this message
Blake Rouse (blake-rouse) wrote :

Unless this is occurring constantly which would prevent DHCP from never starting, then its a critical error. But if its occurring periodically and rackd recovers from the error and fixes DHCP then its only high.

Changed in maas:
assignee: nobody → Blake Rouse (blake-rouse)
tags: added: sprint track
Changed in maas:
status: Triaged → In Progress
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.