Comment 60 for bug 1874075

Revision history for this message
Matthew Ruffell (mruffell) wrote :

What happens (packages from -updates):

On Groovy: Assuming same behaviour as focal due to systemd service file. Untested.

On Focal: The rabbitmq service will start, and stay in 'activating' mode until the daemon notifies systemd that it has started up (type=notify). Every 300 seconds / 5 minutes rabbitmq will log failure to synchronise the message queue until rabbitmq2 returns, but the daemon never dies. TimeoutStartSec=3600 or one hour, so daemon stays waiting for 1 hour, with it soft resetting every 5 minutes as queue synchronisation timeouts occur. Service will only change to 'active' when rabbitmq2 starts and the message queue is synced.

From what I understand, I don't think there is any problems on focal or groovy. As long as rabbitmq2 comes up within an hour, things work. Note because of this bug, groovy and upstream has now been changed to 10 min timeout, down from 1hr.

On Eoan: Assuming same behaviour as focal due to systemd service file. Untested.

On Bionic: The rabbitmq service will start, and runs a ExecStartPost script that waits on the rabbitmq daemon. If this ExecStartPost script times out (which it does after 90 seconds it seems, even though documentation suggests infinite timeout), it terminates with a error exit code, and since the Unit type=simple, systemd marks the service as failed. There is no Restart=on-failure on Bionic's systemd unit, and rabbitmq stays dead. Rabbitmq dies 90 seconds after boot, and will never rejoin the cluster by itself. The machine needs to be power cycled, or manual ssh in and restart rabbitmq services.

On Xenial: Assuming same behaviour as Bionic due to systemd service file. Untested.

Suggested actions:
For Bionic: From my understanding of the problem and my testing, I found that replacing the systemd service file with the one from focal, which changes type=simple to type=notify, with a 1hr timeout, and restart=on-failure solves the problem. Notes: I checked the source code, and rabbitmq in bionic does indeed support type=notify, although, we need to add a dependency to the package, socat. See below commit for details:

commit: 2d6383bade61fea0b8652b72d25bb1a9f0d6133f
From: Alexey Lebedeff <email address hidden>
Date: Fri, 11 Mar 2016 17:42:15 +0300
Subject: Improve systemd integration
Link: https://github.com/rabbitmq/rabbitmq-server/commit/2d6383bade61fea0b8652b72d25bb1a9f0d6133f

Github Issue for above commit: https://github.com/rabbitmq/rabbitmq-server/issues/664

Xenial: I need to dig into this. We will likely follow the same path as bionic, but we need to be careful to ensure service type=notify is sufficiently supported in rabbitmq 3.5.7 before we SRU the change. Will also likely need socat as a dependency and maybe a backport of the above commit.