ibacm service fails if ub_umad module is not loaded

Bug #1794825 reported by Eugenio González
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
rdma-core (Ubuntu)
Fix Released
Undecided
Unassigned
Bionic
Fix Released
Undecided
Unassigned
Cosmic
Fix Released
Undecided
Unassigned

Bug Description

[Impact]

 * The service for ibacm needs special HW to be present

 * The sysV service has no way to express any such dependencies but dh*
   tools will start it in postinst. Due to that the service will fail to
   start.

 * That makes DPDK think the installation failed in general (bad RC on
   service start)

 * We went back and forth upstream and in Debian (same people) but the
   solution seems to be to get rid of the sysV file.

[Test Case]

 * 1. Get to a system that has no infiniband HW and due to that module
   ub_umad not loaded (any container will do).
 * 2. install ibacm
   Without fix:
    Configuring ibacm (17.1-1) ...
    Job for ibacm.service failed because the control process exited with
    error code.
    See "systemctl status ibacm.service" and "journalctl -xe" for details.
    invoke-rc.d: initscript ibacm, action "start" failed.
    ● ibacm.service - InfiniBand Address Cache Manager Daemon
       Active: failed[...]

  * With the fix the install will work
    For the unlikely case that you have all an IB capable system the
    service will load later when the socket is connected (via rdma)
      ListenNetlink=rdma 4

[Regression Potential]

 * If somebody used the non recommended sysV init script directly it will
   after the upgrade be removed and therefore non-functional.
   Other than that nobody should miss anything, as everything else stays
   as-is.

[Other Info]

 * I'm open for discussion, we might decide to NOT rm_conffile the sysV
   script for SRUs if that would be preferred by the SRU Team - let me
   know in that case.

---

Hello

In upgrade someone packages didn't intalled, i don't know anymore

(sorry my bad english)

ProblemType: Package
DistroRelease: Ubuntu 18.04
Package: ibacm 17.1-1
ProcVersionSignature: Ubuntu 4.15.0-34.37-generic 4.15.18
Uname: Linux 4.15.0-34-generic x86_64
ApportVersion: 2.20.9-0ubuntu7.2
AptOrdering:
 libudisks2-0:amd64: Install
 NULL: ConfigurePending
Architecture: amd64
Date: Thu Sep 27 17:40:26 2018
ErrorMessage: instalado ibacm paquete post-installation guión el subproceso devolvió un error con estado de salida 1
InstallationDate: Installed on 2018-05-19 (131 days ago)
InstallationMedia: Xubuntu 18.04 LTS "Bionic Beaver" - Release amd64 (20180426)
Python3Details: /usr/bin/python3.6, Python 3.6.6, python3-minimal, 3.6.5-3ubuntu1
PythonDetails: /usr/bin/python2.7, Python 2.7.15rc1, python-minimal, 2.7.15~rc1-1
RelatedPackageVersions:
 dpkg 1.19.0.5ubuntu2
 apt 1.6.3ubuntu0.1
SourcePackage: rdma-core
Title: package ibacm 17.1-1 failed to install/upgrade: instalado ibacm paquete post-installation guión el subproceso devolvió un error con estado de salida 1
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Eugenio González (eugeagb) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Interesting, thanks for the report.

It seems it breaks to start the service /lib/systemd/system/ibacm.service due to /sys/class/infiniband_mad/abi_version not being available.

That alone is a reasonable error, as the daemon is meant to be the "InfiniBand Address Cache Manager Daemon" and it can't work well without the backend being loaded.

The backend in this case is a kernel module "ib_umad".

As soon as that is loaded the start of the service works.
I'd recommend that (loading the module manually) to you as a workaround for now.

But for the package we'd want to have defaults safe and working, and in case the module isn't loaded I'd expect it to not start at all or load it on demand.

Changed in rdma-core (Ubuntu):
status: New → Confirmed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The service actually defines:
# Ensure required kernel modules are loaded before starting
<email address hidden>
<email address hidden>

$ dpkg -S /lib/systemd/system/rdma-load-modules@.service
rdma-core: /lib/systemd/system/rdma-load-modules@.service

There is a lot of systemd config chatter around, but essentially it does
ExecStart=/lib/systemd/systemd-modules-load /etc/rdma/modules/%I.conf
Which in this case will be /etc/rdma/modules/rdma.conf

That config does not contain "ib_umad" module.

The solutions that come to mind immediately would either be:
1. add ib_umad to /etc/rdma/modules/rdma.conf to be auto-loaded
2. consider this optional and keep the service off unless a admin made ib_umad load
   ConditionPathExists=/sys/class/infiniband_mad/abi_version

Given that there might be plenty of cases for #1 to fail (containers, module not available, module fails to load) I'd think that #2 is required and the decision to add #1 is not mine to make (no experience what it actually does).

I see that /etc/rdma/modules/rdma.conf has other commented out modules, so even if we don't want it auto-loaded we might add it disabled by default.

Changed in rdma-core (Ubuntu):
assignee: nobody → Ubuntu Foundations Team (ubuntu-foundations-team)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Upstream has a bunch of default configs listing the missing module.
kernel-boot/modules/infiniband.conf
kernel-boot/modules/opa.conf

# Access to fabric management SMPs and GMPs from userspace.
ib_umad

Those files are available at /etc/rdma/modules/ but they are not loaded - probably because I have no such IB device available as those files seem to be considered via udev rules.

That said I think there are plenty of cases where the rules would not load the module and the service therefore fails to start.

I'd forget about my suggestion #1 above after learning that, but would think that #2 would be the right thing to avoid the service failing to start in so many other environments.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I subscribed the foundations team as well on the bug.

@Foundations - do you have a semi-dedicated or at least experienced RDMA guy to make the decision about the right next steps on this bug?

Suggestion:
add
  ConditionPathExists=/sys/class/infiniband_mad/abi_version
to
  /lib/systemd/system/ibacm.service

summary: - package ibacm 17.1-1 failed to install/upgrade: instalado ibacm paquete
- post-installation guión el subproceso devolvió un error con estado de
- salida 1
+ ibacm service fails if ub_umad module is not loaded
Revision history for this message
Steve Langasek (vorlon) wrote :

Christian, sorry, Foundations does not have any practical experience with the RDMA stack. At least on paper, this package is owned by the Server Team (http://reqorts.qa.ubuntu.com/reports/m-r-package-team-mapping.html).

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Ok, thanks Steve for the info

Changed in rdma-core (Ubuntu):
assignee: Ubuntu Foundations Team (ubuntu-foundations-team) → nobody
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

This existed in the wild since at least the release of 18.04, so when deciding now to rush it in or fix it in 19.04 and an SRU more (to 18.04+18.10) I chose the latter.

It is broken in source:
./ibacm/ibacm.service.in
From https://github.com/linux-rdma/rdma-core

Submitted upsteam to the mailing list and as
https://github.com/linux-rdma/rdma-core/pull/393

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

From upstream a few fair questions, this goes on at https://www.spinics.net/lists/linux-rdma/msg69992.html but the archive updates slowly.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Summarizing Upstream feedback:
#1
# apt install ibacm
[...]
ibacm.service failed
Question: why does it even start it is meant to be socket activated

That is correct, it should be - but it only uses dh_installinit which will start it as it doesn#t know about service/sockets.

I need to experiment with a d/rules fix to make this better.

#2 ConditionPath would be bad for hotplug, I doubt that and will reply accordingly

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

We have a manual section for override_dh_installinit
 64 override_dh_installinit:
 65 »···dh_installinit -pibacm --onlyscripts
 66 »···dh_installinit -prdma-core --onlyscripts --name=iwpmd
 67 »···dh_installinit --remaining-packages

And only the autosections for dh_installsystemd which would do right thing these days (for systemd only).

There is also /etc/init.d/ibacm for old style sysV, but due to the sysV->systemd wrapping this mis-starts the systemd unit.
This is our issue ^^

We can:
- remove the sysV file - that would avoid the dh helpers to pick it up and mis-start it through those wrappers.
- install the sysV scripts with --no-start (to keep them available but not messing up through the wrappers)

Both somewhat inhibit the sysV compat, with the second less so.
And since systemd is the de feacto default init - and in Ubuntu sysV also not being supported I think the latter is what we can try and upgrading to the former if not enough.

I have put a ppa with such a fix to [1].

Tests:
#1 - Upgrading from broken state to the fixed PPA: working
     This no more has the restart, so the upgrade works (otherwise it was failed postinst).
     The systemd services stay "as they are" which is socket up and service failed (ok).
#2 - fresh install from fixed PPA: not working
     The no-start of dh_installinit has carried over to dh_installsystemd which is not what we
     want.

A second approach (before droppign the sysV script) would be to not call the dh_installinit for it in the override section, the result will depend what it picks up automatically.

Tests:
#1 - Upgrading from broken state to the fixed PPA: working
     This no more has the restart, so the upgrade works (otherwise it was failed postinst).
     The systemd services stay "as they are" which is socket up and service failed (ok).
#2 - fresh install from fixed PPA: not working
     The service&socket are still dead.

Hrm, it only has the dh_installsystemd snippets to enable, but none to start it.
Need to continue next week on this ...

Once fixed IMHO combining this with the Condition I check would be best, I'll re-ping on the mail thread and extend the PR once I found a working approach to the sysV-vs-systemd mess in this case.

[1]: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3477

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Discussion upstream continues, the suggested approach with the Checkpath isn't helping.
I have crated some debug code for the discussion and replied to the ML thread once more.
- https://gist.github.com/cpaelzer/fc3abd28f81eda55ffb317bb4091bf48
- https://gist.github.com/cpaelzer/80331eb7b2a74836b52522bd076a5296

Fully resolving this issue might become a long term upstream issue.
For now here we'd want to focus on at least getting the default package install not immediately run into failed state - everything else might be up to upstream discussion and development eventually.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

It seems we have to write all maintscript code for ibacm ourselves or we just drop the no more needed sysV script.
Built version ~ppa4 in the linked PPA.

But now dh_installsystemd wants to start socket (correct) AND service (wrong) - arr.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Looks like this in postinst:
  deb-systemd-invoke $_dh_action 'ibacm.service' 'ibacm.socket' >/dev/null || true
I already had a bug (with Debian) about debhelpers and that socket+service should not start both but I fail to find it atm.
I also get the feeling I miss something here :-/

Well it is a fast and easy to iterate on experiments for now.

Note: as with my other tests the trigger to the ibacm socket can easily be reproduced with:
$ echo "Hello World2"| netcat -U /run/ibacm-unix.sock

Tests:
#1 - Upgrading from broken state to the fixed PPA: working
     This no more has the restart, so the upgrade works (otherwise it was failed postinst).
     The systemd services stay "as they are" which is socket up and service failed (ok).
     A connect to the socket after the upgrade tries to start the service (ok if working,
     propagating the fail to the socket if failing)
       ibacm.socket: Failed with result 'service-start-limit-hit'.
     That is ok, package upgrade fixed the package install status and socket activation woudl
     still work.
#2 - fresh install from fixed PPA: working
     The service is loaded and enabled and the socket is active
     A connect to the socket tries to start the service (ok if working,
     propagating the fail to the socket if failing)

Overall:
- the package postinst stopped to be broken
- the module for it has to be available late (when the socket is used) instead of on-install
  allowing some hotplug
- It isn't too important if it fails on the condition check or as it does now

The log is correct:
  /var/log/ibacm.log:
  [...]
  2018-10-16T10:55:11.560: acm_open_devices: ERROR - unable to get device list
  2018-10-16T10:55:11.560: main: ERROR - unable to open any devices

That said, upstream or at least Debian (which I think is the same in this case) would need to agree to drop the sysV script.

I'll propose it there (PR that is already linked) and on the Mailing list once I rebased my tests for them.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Replied on the ML about the new approach.
New PR [1] superseded the old one.

I have another idea which would keep sysV init around, but in --no-start, IF (and that is an if) I can get dh_installsystemd to ignore what I told dh_installinit.
More on that once I tested it.

[1]: https://github.com/linux-rdma/rdma-core/pull/401

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I had the idea of maybe keeping the sysV file (if someone wants it for odd setups), but trying to spread --no-start all over the place to hopefully reach a situation where the .socket but nothing else is started on install/upgrade.

Therefore it would be less invasive, but still achieve what we want.
See [1] for a branch with that.

But as I experienced in the past, it does not start the socket then.
dh_installinit section does only update (as intended):
  update-rc.d ibacm defaults
but dh_installsystemd takes over the --no-start and ends up calling "deb-systemd-helper update-state 'ibacm.socket'" twice.

The last option would be to manually code the snippets correctly to have:
- sysV init installed but not started
- .service installed but not started
- .socket installed and started

Hrm, that should be a common case - I'm getting more and more sure I'm missing something ...
I'll compare a few other packages.

[1]: https://github.com/cpaelzer/rdma-core/tree/debian-avoid-unconditional-ibacm-start-noremove

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package rdma-core - 21.0-1

---------------
rdma-core (21.0-1) unstable; urgency=medium

  * New upstream release.
    - Drop ibacm sysV init script to avoid issues with the sysV to systemd
      wrapper starting the service instead of the socket (LP: #1794825)
    - Include static libraries in the build
  * Update private libibverbs symbols
  * Specify Build-Depends-Package in symbols

 -- Benjamin Drung <email address hidden> Tue, 20 Nov 2018 11:49:25 +0100

Changed in rdma-core (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Picked the now upstream accepted change.
It is already in Debian (and synced to Disco).
Prepped for Bionic and Cosmic in PPA.

description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Verified once more from PPA build.
And added the SRU template, ready for upload to SRU queue.

Changed in rdma-core (Ubuntu Bionic):
status: New → Triaged
Changed in rdma-core (Ubuntu Cosmic):
status: New → Triaged
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Eugenio, or anyone else affected,

Accepted rdma-core into cosmic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/rdma-core/19.0-1ubuntu0.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-cosmic to verification-done-cosmic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-cosmic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in rdma-core (Ubuntu Cosmic):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-cosmic
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hmm, only cosmic was accepted; Bionic still waits.
So lets verify just cosmic for now ...

Rechecked non proposed.
Install directly triggers the issue of the service breaking and the package install being considered bad by dpkg.

Same install from proposed works.
Package installs fine - service might even be down from pre-upgrade installs, but that is no problem anymore.

Setting verified

tags: added: verification-done verification-done-cosmic
removed: verification-needed verification-needed-cosmic
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Eugenio, or anyone else affected,

Accepted rdma-core into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/rdma-core/17.1-1ubuntu0.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in rdma-core (Ubuntu Bionic):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-bionic
removed: verification-done
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package rdma-core - 19.0-1ubuntu0.1

---------------
rdma-core (19.0-1ubuntu0.1) cosmic; urgency=medium

  * Drop to avoid issues with the sysV to systemd wrapper starting the service
    instead of the socket (LP: #1794825)
    - d/rules: drop dh_installinit call for ibacm
    - d/rules: let dh_installsystemd not start ibacm.service
    - d/ibacm.maintscript: remove old conffile
    - d/ibacm.install: no more install sysV script

 -- Christian Ehrhardt <email address hidden> Thu, 29 Nov 2018 14:08:48 +0100

Changed in rdma-core (Ubuntu Cosmic):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for rdma-core has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Eugenio González (eugeagb) wrote :

Hello

Thanks a lot for your attention and effort. After choose proposed and update the message of error don't appear anymore.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Works on Bionic for me just as well as in Cosmic.
Eugenio didn't report on what he tested but his report was 18.04 as well.

Setting verified

tags: added: verification-done verification-done-bionic
removed: verification-needed verification-needed-bionic
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package rdma-core - 17.1-1ubuntu0.1

---------------
rdma-core (17.1-1ubuntu0.1) bionic; urgency=medium

  * Drop to avoid issues with the sysV to systemd wrapper starting the service
    instead of the socket (LP: #1794825)
    - d/rules: drop dh_installinit call for ibacm
    - d/rules: let dh_installsystemd not start ibacm.service
    - d/ibacm.maintscript: remove old conffile
    - d/ibacm.install: no more install sysV script

 -- Christian Ehrhardt <email address hidden> Thu, 29 Nov 2018 14:08:48 +0100

Changed in rdma-core (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Proposed package upload rejected

An upload of rdma-core to disco-proposed has been rejected from the upload queue for the following reason: "Your source.changes file contains changes for more than the version you intended, please reupload it. https://launchpadlibrarian.net/421428873/rdma-core_22.1-1ubuntu0.1_source.changes".

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Stray update due to my changes file covering too much of the past - not related to this bug.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.