Comment 11 for bug 2029417

Revision history for this message
Jacopo Rota (r00ta) wrote :

As per mattermost chat, I think I finally found why the loseConnection was not called as I suspected last week: It is done on purpuse in our code according to https://git.launchpad.net/maas/tree/src/provisioningserver/rpc/common.py#n318 .
Looking at the defailt implementation of twisted, unhandledError would call loseConnection! https://github.com/twisted/twisted/blob/a9ee8e59a5cd1950bf1a18c4b6ca813e5f56ad08/src/twisted/protocols/amp.py#L2507 but this behaviour was overwritten in 2015 for https://bugs.launchpad.net/maas/+bug/1457799 .
The proof is that in the new logs it's much more clean that the number of connections are wrong exactly when we hit the exceptionUnhandled failure during AMP request (see the screenshot, the blue vertical dashed line is put when we see those Unhandled failure during AMP request exceptions - It's now evident that the DHCP exceptions start after we hit the AMP exception)

So I think the fix could simply be something like

```
    def unhandledError(self, failure):
        """Terminal errback, after application code has seen the failure.

        `amp.BoxDispatcher.unhandledError` calls the `amp.IBoxSender`'s
        `unhandledError`. In the default implementation this disconnects the
        transport.

        Here we instead log the failure but do *not* disconnect because it's
        too disruptive to the running of MAAS.
        """
        if failure.check(builtins.RuntimeError) and "The handler is closed" in failure.getErrorMessage():
            super().unhandledError(failure)
            log.info("The handler is closed, the connection will be dropped.")
        else:
            log.err(
                failure,
                (
                    "Unhandled failure during AMP request. This is probably a bug. "
                    "Please ensure that this error is handled within application "
                    "code."
                ),
            )
```
can anybody in the team double check my latest findings above and send a MP? I'm currently off and I can't follow up in the working hours of the team