rabbitmq-server 3.9.13 Mnesia Collector Throws Error

Bug #1988283 reported by Jasper Ras
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
RabbitMQ
Fix Released
Unknown
rabbitmq-server (Ubuntu)
Incomplete
Undecided
Unassigned
Jammy
Incomplete
Undecided
Unassigned

Bug Description

[ Impact ]

Background:
Mnesia is a distributed database that RabbitMQ uses to store information
about users, exchanges, queues, and bindings. Messages, however are not
stored in the database. More at
https://www.erlang.org/doc/apps/mnesia/mnesia_overview
For monitoring rabbitmq it integrates prometheus.erl
https://github.com/deadtrickster/prometheus.erl/blob/master/doc/README.md

 * In the current version of rabbitmq in Jammy and Kinetic there is a bug
   in the embedded prometheur.erl which crashes when reading that mnesia
   info from rabbitmq but getting an "undefined".

 * Without handling this more gracefully that crashes the collector
   as outlined in more detail at
   https://github.com/deadtrickster/prometheus.erl/issues/133

 * The fix itself is not too complex, an abstraction of mnesia:table_info
   that now catches the "undefined" case and in that case maps it to "0".

[ Test Plan ]

 * To be defined after reporter feedback

[ Where problems could occur ]

 * When looking at rabbitmq overall this seems to be a corner case;
   The code change (and thereby the regression risk) would only affect
   those people using the prometheus_mnesia_collector which is
   disfunctional without this fix anyway.

 * We have all the usual "could always happen" issues like rebuilds
   introducing other dependencies or anything similar - but those exist
   whatever we do. Gladly we are not messing with configs, services or
   anything else that is persistent.

[ Other Info ]

 * The patch is from prometheus.erl upstream (which does not exist in the
   archive on its own) and applied to rabbitmq which embeds the source.

---

The prometheus client in 3.9.13 contains a bug that crashes the prometheus_mnesia collector. Better described at https://github.com/deadtrickster/prometheus.erl/issues/133.

The bug is fixed in rabbitmq-server 3.9.15 as noted in https://github.com/rabbitmq/rabbitmq-server/pull/4376#issuecomment-1086772472

I'd like to request this package be bumped to 3.9.15 or the fix backported.

Related branches

Paride Legovini (paride)
Changed in rabbitmq-server (Ubuntu):
status: New → Triaged
Revision history for this message
Paride Legovini (paride) wrote :

I didn't setup a reproducer for this one, but the situation looks pretty clear. Currently affected are:

 rabbitmq-server | 3.9.13-1 | jammy
 rabbitmq-server | 3.9.13-1 | kinetic

The 3.9.x releases are maintenance releases (see [1]), and as they appear to be bugfix only we could still fix this bug in Kinetic by packaging 3.9.15 (ahead of Debian).

I don't think the package has enough tests to do an "new upstream microrelease" upload: the autopkgtest is just a smoke tests, and the upstream tests are disabled in d/rules, so fixing the bug in Jammy will probably require cherry-picking the right commit.

[1] https://rabbitmq.com/github.html

Changed in rabbitmq-server (Ubuntu Jammy):
status: New → Triaged
tags: added: server-todo
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The breakage is in https://github.com/deadtrickster/prometheus.erl 4.8.1 which is used as embedded code :-/ in rabbitmq. The "fix" to rabbitmq in their git repository that landed just updates the references to pull in the newer 4.8.2 (up from the former 06425c21 and doing so now via hex).

Indeed 3.9.15 is the latest release that would combine all of that (pulled the new fixes like this and others) to the 3.9.x series.

For the SRU we have to look inside or premetheus.erl where this was fixed via https://github.com/deadtrickster/prometheus.erl/pull/140 by this change:
https://github.com/deadtrickster/prometheus.erl/commit/ffe2bf711659f5ee11970b8a60f7f0f72fc68770

I'm not familiar enough with rabbitmq to move it to 3.9.15 so late in the Kinetic cycle.
But I've added it to the candidates of minor release upgrades. If things look stable enough that might happen later on.

For now I think we should fix the issue, ask the reporter to try it from a PPA and helping us to prep an SRU case then.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

PPA: https://launchpad.net/~paelzer/+archive/ubuntu/lp-1988283-rabbitmq-crashes-mnesia/+packages

Hi Jasper,
could you give the test build in the referenced PPA a try.
Your confirmation that it fixes the issue should be enough to upload it to Kinetic.

For the SRU [1] Process into Jammy I will need some help form you.
In particular what someone would need to do from a fresh blank system to install&configure it to trigger the issue. That is needed to verify the case is existing and then after the update fixed.

I'll create the rest of the SRU template, but on that I'd really need your help.

Marking as incomplete until I got these information.

[1]: https://wiki.ubuntu.com/StableReleaseUpdates

Changed in rabbitmq-server (Ubuntu):
status: Triaged → Incomplete
Changed in rabbitmq-server (Ubuntu Jammy):
status: Triaged → Incomplete
Changed in rabbitmq-server (Ubuntu):
assignee: nobody → Christian Ehrhardt  (paelzer)
Changed in rabbitmq-server (Ubuntu Jammy):
assignee: nobody → Christian Ehrhardt  (paelzer)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi, there was no further feedback last week or over the weekend.
I just wanted to clarify that I really need someone hitting (and being able to recreate this issue) to help us driving this further into Kinetic, and from there as SRU into Jammy.

Unassigning myself for now since I can't continue, if updated it will show up in triage.
@triagers - if the feedback is good feel free to directly convert the MPs into an upload then.

Changed in rabbitmq-server (Ubuntu):
assignee: Christian Ehrhardt  (paelzer) → nobody
Changed in rabbitmq-server (Ubuntu Jammy):
assignee: Christian Ehrhardt  (paelzer) → nobody
tags: removed: server-todo
Changed in rabbitmq:
status: Unknown → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.