Pacemaker package upgrades stop but fail to start pacemaker resulting in HA outage

Bug #1727063 reported by Drew Freiberger
36
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack HA Cluster Charm
Invalid
Critical
Unassigned
init-system-helpers (Ubuntu)
Opinion
Undecided
Unassigned
pacemaker (Ubuntu)
Fix Released
Critical
Unassigned
Xenial
Fix Released
Critical
Unassigned
Zesty
Fix Released
Critical
Unassigned
Artful
Fix Released
Critical
Unassigned
Bionic
Fix Released
Critical
Unassigned

Bug Description

[Impact]
upgrades of the pacemaker package don't restart pacemaker after the package upgrade, resulting in down HA clusters.

[Test Case]
sudo apt install pacemaker
sudo systemctl start pacemaker
sudo dpkg-reconfigure pacemaker

pacemaker daemons will not be restarted.

[Regression Potential]
Minimal, earlier and later versions provide the defaults in the lsb header.

[Original Bug Report]
We have found on our openstack charm-hacluster implementations that the pacemaker .deb packaging along with the upstream pacemaker configuration result in pacemaker stopping but not starting upon package upgrade (while attended or unattended).

This was seen on three separate Xenial clouds. Both Mitaka and Ocata.

The package upgrade today was to pacemaker 1.1.14-2ubuntu1.2.

It appears that pacemaker.prerm stops the service using
"invoke-rc.d pacemaker stop" and then the pacemaker.postinst attempts to start the service, but silently fails due to policy denial. It appears the policy check fails because /etc/rcX.d/S*pacemaker does not exist because /etc/init.d/pacemaker has no Default-Start or Default-Stop entries in the LSB init headers. (or rather, they are blank.)

I have not checked whether this affects trusty environments.

I'd suggest on systems that use systemd, the pacemaker.postinst script should check if the service is enabled and start it with systemctl commands rather than using the cross-platform compatible invoke-rc.d wrappers. Or upstream pacemaker should get default start/stop entries.

Our default runlevel on cloud init built images appears to be 5 (graphical), so at least 5 should be present in /etc/init.d/pacemaker LSB init headers under Default-Start:.

Revision history for this message
David Ames (thedac) wrote :

From the charm perspective we need to determine if the charm does anything beyond the packaging that could lead to this.

The charm runs:
update-rc.d -f pacemaker defaults

Testing.

Changed in charm-hacluster:
importance: Undecided → Critical
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in pacemaker (Ubuntu):
status: New → Confirmed
tags: added: cpe-onsite
Revision history for this message
James Page (james-page) wrote :
Revision history for this message
James Page (james-page) wrote :
Revision history for this message
James Page (james-page) wrote :

Other related bugs - bug 1322899 and bug 1052449

Changed in pacemaker (Ubuntu):
status: Confirmed → Triaged
importance: Undecided → High
importance: High → Critical
James Page (james-page)
Changed in pacemaker (Ubuntu Zesty):
importance: Undecided → Critical
Changed in pacemaker (Ubuntu Artful):
importance: Undecided → Critical
Changed in pacemaker (Ubuntu Xenial):
status: New → Triaged
Changed in pacemaker (Ubuntu Zesty):
status: New → Triaged
Changed in pacemaker (Ubuntu Xenial):
importance: Undecided → Critical
Changed in pacemaker (Ubuntu Artful):
status: New → Triaged
Revision history for this message
James Page (james-page) wrote :

I think this is a packaging/upstream bug which I appear to have had a go at before (see #4); however I've not seen this specific symptom before.

Changed in charm-hacluster:
status: New → Invalid
Revision history for this message
James Page (james-page) wrote :

Marking charm bug task as invalid - its doing the right things, however the package does not declare default start/stop levels so its all foobar.

Revision history for this message
James Page (james-page) wrote :

Hmmm - however pacemaker@xenial ships with a native systemd unit; so I'm baffled as to the behaviour of the package upgrade.

description: updated
Revision history for this message
James Page (james-page) wrote :

This issue can be fixed by re-adding the required Start/Stop bits to the LSB header; however the behaviour of update-rc.d when native systemd unit files are in use looks odd to me.

Raising an init-system-helpers bug task for foundations team input on this.

Revision history for this message
James Page (james-page) wrote :

Actually this only impacts Xenial; Debian carried a patch from:

pacemaker (1.1.15~rc3-1) unstable; urgency=medium

  [ Christoph Berg ]
  * [23ee108] libcrmservice3.symbols: Exclude systemd symbol on non-linux
  * [41dea05] Fix time formatting on x32
  * [533c5cc] Fix FTBFS on GNU Hurd

  [ Arturo Borrero Gonzalez ]
  * [698053d] d/tests/control: add isolation-container restriction

  [ Ferenc Wágner ]
  * [065159d] New patch Enable-the-init-scripts-on-multi-user-runlevels.patch
  * [7a5008b] New patch Make-the-asciidoc-documentation-reproducible.patch
  * [08a4162] New patch Add-remote_fs-dependencies-to-the-init-scripts.patch
  * [7a65d2c] New upstream release (1.1.15~rc1)
  * [dd9f5f4] Remove upstreamed patches, refresh the rest
  * [0cdd116] Update symbol files
  * [3de7a21] New patch Add-documentation-URIs-to-the-service-files.patch
  * [9faaa02] Move documentation generators into Build-Depends-Indep
  * [9362a46] Move documentation into /usr/share/doc/pacemaker
  * [64c0e84] Move misc documentation files into the pacemaker-doc package
  * [837e74d] New patch Read-default-files-in-pacemaker.service.patch
  * [72fef80] New upstream release (1.1.15~rc2)
  * [69ee575] Remove patch included in 1.1.15~rc2
  * [46dce98] Also add documentation URI to our version of crm_mon.service
  * [8c25aff] New patch to avoid using WIFCONTINUED entirely
  * [a28e0ae] New upstream release (1.1.15~rc3)
  * [a8976fe] Remove freshly upstreamed patches, refresh the rest

 -- Ferenc Wágner <email address hidden> Sat, 28 May 2016 22:28:49 +0200

Which re-enables the default runlevels.

Changed in pacemaker (Ubuntu Bionic):
status: Triaged → Fix Released
Changed in pacemaker (Ubuntu Artful):
status: Triaged → Fix Released
Changed in pacemaker (Ubuntu Zesty):
status: Triaged → Fix Released
Revision history for this message
James Page (james-page) wrote :

Proposed update for xenial

Revision history for this message
James Page (james-page) wrote :

Doing some testing via:

  https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3010

before upload to xenial-proposed; not that the act of fixing this problem will in itself cause the pacemaker daemons to be started on upgrade. However it won't be auto-applied on a default install (unlike the security update that caused the original issue).

Revision history for this message
James Page (james-page) wrote :

Tested OK (pacemaker was restarted after upgrade)

Uploaded to unapproved queue for SRU Team review - I'd suggest we fasttrack this ASAP into -updates and -security to catch auto updates for those systems not already updated.

James Page (james-page)
description: updated
Revision history for this message
James Page (james-page) wrote :
Revision history for this message
James Page (james-page) wrote :

Testing from security-proposed PPA:

Broken install (dpkg-reconfigure pacemaker already run, daemons stopped): Packages updated and pacemaker daemon restarted by postinst: OK

Running install (pacemaker daemons running prior to pkg upgrade): Packages update, pacemaker daemon stop prior to unpack and restarted by postinst: OK

From my perspective the update looks good - but would appreciate a second pair of eyes on this one.

Revision history for this message
James Page (james-page) wrote :

Also tested proposed fix on a three unit gnocchi HA deployment; all pacemaker updates applied and restarted pacemaker postinst as desired.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

Tested proposed package upgrade and works for me - http://pastebin.ubuntu.com/25815820/

Revision history for this message
James Page (james-page) wrote :

Also validated that the update fixes an existing machine with the 1.2 update on it (with pacemaker services not running).

Pacemaker started after the update.

Revision history for this message
Andy Whitcroft (apw) wrote :

Did some touch testing in a VM and confirmed that the 1.2 package loses its daemons on any kind of reconfigured. This behaviour is rectified with the 1.3 package.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package pacemaker - 1.1.14-2ubuntu1.3

---------------
pacemaker (1.1.14-2ubuntu1.3) xenial; urgency=medium

  * Fix default start/stop levels for init scripts, ensuring that
    running pacemaker daemons are restarted after package upgrades
    (LP: #1727063):
    - d/p/Enable-the-init-scripts-on-multi-user-runlevels.patch: Cherry
      picked fix from later package versions.

 -- James Page <email address hidden> Wed, 25 Oct 2017 10:04:30 +0100

Changed in pacemaker (Ubuntu Xenial):
status: Triaged → Fix Released
Revision history for this message
Andy Whitcroft (apw) wrote : Update Released

The verification of the Stable Release Update for pacemaker has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

tags: added: patch
Ryan Beisner (1chb1n)
tags: added: uosci
Revision history for this message
Drew Freiberger (afreiberger) wrote :

re: init-system-helpers, I noticed the oddity of the init script on a systemd system as well and found that there's a systemd hack in /lib/lsb/init-functions.d/40-systemd that allows for multi-startup compatibility. I believe invoke-rc.d should check systemd "enabled/disabled" state instead of just the S/K links in /etc/rcX.d.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in init-system-helpers (Ubuntu Artful):
status: New → Confirmed
Changed in init-system-helpers (Ubuntu Xenial):
status: New → Confirmed
Changed in init-system-helpers (Ubuntu Zesty):
status: New → Confirmed
Changed in init-system-helpers (Ubuntu):
status: New → Confirmed
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Is systemd's sysv-generator prioritized over regular systemd unit files ? My question raises from this fix. It looks like unit files were automatically created because of the existence of the wrong LSB parameters in init files - by the generator - but at the same time Xenial should be using systemd unit files instead (as they are present in this package).

Revision history for this message
Drew Freiberger (afreiberger) wrote :

From a high level, it appears that invoke-rc.d script used for compatibility falls back to checking for /etc/rcX.d symlinks for a "policy" check if there is no $POLICYHELPER installed. Perhaps the actual shortcoming is not having the policy-rc.d installed to prefer systemd over init.d on Xenial.

Revision history for this message
Greg Lutostanski (lutostag) wrote :

Confirmed with Drew that this bug has not been hit since the pacemaker fix released.

Can remove field-high team, but keep for the discussion on init-system-helpers.

David Britton (dpb)
no longer affects: init-system-helpers (Ubuntu Bionic)
no longer affects: init-system-helpers (Ubuntu Artful)
no longer affects: init-system-helpers (Ubuntu Zesty)
no longer affects: init-system-helpers (Ubuntu Xenial)
Changed in init-system-helpers (Ubuntu):
status: Confirmed → Opinion
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.