[2.4] Registering RPC connections with master can occur out of order

Bug #1765056 reported by Blake Rouse
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Critical
Blake Rouse

Bug Description

When a regiond process is reporting the RPC connections to a master process the order of actions is not deterministic being async. That means a new re-connection can tell the master the process has that connection and an old disconnection defer can later come back and say it has disconnected. This out of order is what causes the master to think the regiond process doesn't have that RPC connection, when in reality it actually does have that connection.

Since the rack controller actually does have that connection it will not perform an re-connection to fix this issue.

018-04-17 21:13:43 maasserver.rpc.regionservice: [critical] Failed to register rack controller 'mar3yn' with the master. Connection will be dropped.
 Traceback (most recent call last):
   File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 567, in _startRunCallbacks
     self._runCallbacks()
   File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 653, in _runCallbacks
     current.result = callback(current.result, *args, **kw)
   File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1442, in gotResult
     _inlineCallbacks(r, g, deferred)
   File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1386, in _inlineCallbacks
     result = g.send(result)
 --- <exception caught here> ---
   File "/usr/lib/python3/dist-packages/maasserver/rpc/regionservice.py", line 609, in _register
     yield self.initResponder(rack_controller)
   File "/usr/lib/python3/dist-packages/provisioningserver/utils/twisted.py", line 159, in wrapper
     return func(*args, **kwargs)
   File "/usr/lib/python3/dist-packages/maasserver/rpc/regionservice.py", line 532, in initResponder
     self.host = self.transport.getHost()
 builtins.AttributeError: 'NoneType' object has no attribute 'getHost'

2018-04-17 21:13:43 maasserver.ipc: [info] Worker pid:22906 lost RPC connection to ('mar3yn', '10.244.40.32', 5250).
2018-04-17 21:13:43 maasserver.ipc: [info] Worker pid:22905 lost RPC connection to ('mar3yn', '10.244.40.32', 5252).
2018-04-17 21:13:43 provisioningserver.utils.services: [info] mDNS observation process started.
2018-04-17 21:13:44 twisted.internet.protocol.Factory: [info] RegionServer connection established (HOST:IPv6Address(TCP, '::ffff:10.244.40.32', 5253) PEER:IPv6Address(TCP, '::ffff:10.244.40.32', 56264))
2018-04-17 21:13:44 twisted.internet.protocol.Factory: [info] RegionServer connection established (HOST:IPv6Address(TCP, '::ffff:10.244.40.32', 5252) PEER:IPv6Address(TCP, '::ffff:10.244.40.32', 35314))
2018-04-17 21:13:44 twisted.internet.protocol.Factory: [info] RegionServer connection established (HOST:IPv6Address(TCP, '::ffff:10.244.40.32', 5250) PEER:IPv6Address(TCP, '::ffff:10.244.40.32', 38564))
2018-04-17 21:13:44 twisted.internet.protocol.Factory: [info] RegionServer connection established (HOST:IPv6Address(TCP, '::ffff:10.245.219.208', 5251) PEER:IPv6Address(TCP, '::ffff:10.245.219.208', 37826))
2018-04-17 21:13:44 maasserver.rpc.regionservice: [info] Rack controller authenticated from '::ffff:10.244.40.32:56264'.
2018-04-17 21:13:44 maasserver.rpc.regionservice: [info] Rack controller authenticated from '::ffff:10.244.40.32:38564'.
2018-04-17 21:13:44 maasserver.rpc.regionservice: [info] Rack controller authenticated from '::ffff:10.244.40.32:35314'.
2018-04-17 21:13:44 maasserver.rpc.regionservice: [info] Rack controller authenticated from '::ffff:10.245.219.208:37826'.
2018-04-17 21:13:45 maasserver.ipc: [info] Worker pid:22906 registered RPC connection to ('mar3yn', '10.244.40.32', 5250).
2018-04-17 21:13:47 maasserver.ipc: [info] Worker pid:22907 registered RPC connection to ('mar3yn', '10.245.219.208', 5251).
2018-04-17 21:13:47 maasserver.ipc: [info] Worker pid:22905 registered RPC connection to ('mar3yn', '10.244.40.32', 5252).
2018-04-17 21:13:48 maasserver.ipc: [info] Worker pid:22904 registered RPC connection to ('mar3yn', '10.244.40.32', 5253).
2018-04-17 21:13:51 maasserver.ipc: [info] Worker pid:22904 lost RPC connection to ('mar3yn', '10.244.40.32', 5253).

Related branches

Changed in maas:
milestone: 2.4.0beta3 → 2.4.0rc1
description: updated
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
milestone: 2.4.0rc1 → 2.4.0beta3
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.