hostmonitor can not monitor pacemaker_remote node via cibadmin query

Bug #1728527 reported by Hieu LE
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Fix Released
Undecided
Unassigned
Rocky
Won't Fix
Undecided
Unassigned
Stein
Fix Released
Undecided
Unassigned
masakari-monitors
Fix Released
Undecided
Liam Young
masakari-monitors (Ubuntu)
Fix Released
High
James Page
Disco
Fix Released
High
James Page

Bug Description

Currently Masakari host-monitor only grep the `crmd` status of real node via `cibadmin -Q` command.
In case of pacemaker_remote, the `crmd` attribute is not existed, so remote node always marked in `None` state.

Below is an example xml status of remote node:
<node_state remote_node="true" id="cpu1" uname="cpu1" crm-debug-origin="remote_node_init_status" node_fenced="0">
      <transient_attributes id="cpu1">
        <instance_attributes id="status-cpu1"/>
      </transient_attributes>
    </node_state>
    <node_state remote_node="true" id="cpu2" uname="cpu2" crm-debug-origin="remote_node_init_status" node_fenced="0"/>

And the log from masakari hostmonitor:
2017-10-30 14:15:44.679 1813 INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] Recognized 'cpu1' as a new member of cluster. Host status is 'None'.

Changed in masakari-monitors:
assignee: nobody → takahara.kengo (takahara.kengo)
Changed in masakari-monitors:
assignee: takahara.kengo (takahara.kengo) → Hieu LE (hieulq)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to masakari-monitors (master)

Fix proposed to branch: master
Review: https://review.openstack.org/647756

Changed in masakari-monitors:
assignee: Hieu LE (hieulq) → Liam Young (gnuoy)
Revision history for this message
Adam Spiers (adam.spiers) wrote :

Pacemaker is a cluster manager rather than monitoring software, so IIUC its (host) state was not really designed to be polled via this kind of "pull" model - instead its state machine was designed to "push" events and initiate actions via resource agents. Therefore long term I think we need to implement https://storyboard.openstack.org/#!/story/2002124 which replaces this hostmonitor with the nova-host-alerter OCF RA.

Changed in masakari-monitors (Ubuntu Disco):
status: New → Triaged
status: Triaged → New
James Page (james-page)
Changed in masakari-monitors (Ubuntu Disco):
status: New → In Progress
importance: Undecided → High
assignee: nobody → James Page (james-page)
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package masakari-monitors - 7.0.0~rc1-0ubuntu2

---------------
masakari-monitors (7.0.0~rc1-0ubuntu2) disco; urgency=medium

  [ Corey Bryant ]
  * d/control: Set source package Section to net, fixing
    binary-control-field-duplicates-source lintian tag.

  [ James Page ]
  * d/p/bug1728527.patch: Cherry pick fix to resolve issues with use of
    pacemaker-remote for remote management of hypervisors (LP: #1728527).

 -- Corey Bryant <email address hidden> Tue, 09 Apr 2019 16:22:18 +0100

Changed in masakari-monitors (Ubuntu Disco):
status: In Progress → Fix Released
Changed in cloud-archive:
status: New → Fix Committed
Revision history for this message
James Page (james-page) wrote :

This bug was fixed in the package masakari-monitors - 7.0.0~rc1-0ubuntu2~cloud0
---------------

 masakari-monitors (7.0.0~rc1-0ubuntu2~cloud0) bionic-stein; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 masakari-monitors (7.0.0~rc1-0ubuntu2) disco; urgency=medium
 .
   [ Corey Bryant ]
   * d/control: Set source package Section to net, fixing
     binary-control-field-duplicates-source lintian tag.
 .
   [ James Page ]
   * d/p/bug1728527.patch: Cherry pick fix to resolve issues with use of
     pacemaker-remote for remote management of hypervisors (LP: #1728527).

Changed in cloud-archive:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to masakari-monitors (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/681965

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to masakari-monitors (master)

Reviewed: https://review.opendev.org/647756
Committed: https://git.openstack.org/cgit/openstack/masakari-monitors/commit/?id=dc9b77772417c99368a4bbe243bb8e0e7c0bca47
Submitter: Zuul
Branch: master

commit dc9b77772417c99368a4bbe243bb8e0e7c0bca47
Author: Liam Young <email address hidden>
Date: Tue Mar 19 20:05:22 2019 +0000

    Use crm_mon for pacemaker-remote deployments

    As described in bug #1728527 cibadmin does not expose the state of
    the pacemaker-remote nodes which means hostmonitor cannot track
    them. This change switches to use crm_mon to check the status of
    remote nodes if the new config option host.restrict_to_remotes
    to set to True. This will trigger host monitor to use crm_mon
    to monitor nodes and will only monitor nodes that are marked
    as remotes (not members).

    Change-Id: I3f2026805413504c875ea5f39eb036d44b26dd43
    Depends-On: Iaa2251708616e9c69817bf5b346d795ea7a4d21b
    Closes-Bug: #1728527

Changed in masakari-monitors:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to masakari-monitors (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/688021

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to masakari-monitors (stable/train)

Reviewed: https://review.opendev.org/688021
Committed: https://git.openstack.org/cgit/openstack/masakari-monitors/commit/?id=b02c6b6931c0256f4ce6d7167c97ebb849ff3453
Submitter: Zuul
Branch: stable/train

commit b02c6b6931c0256f4ce6d7167c97ebb849ff3453
Author: Liam Young <email address hidden>
Date: Tue Mar 19 20:05:22 2019 +0000

    Use crm_mon for pacemaker-remote deployments

    As described in bug #1728527 cibadmin does not expose the state of
    the pacemaker-remote nodes which means hostmonitor cannot track
    them. This change switches to use crm_mon to check the status of
    remote nodes if the new config option host.restrict_to_remotes
    to set to True. This will trigger host monitor to use crm_mon
    to monitor nodes and will only monitor nodes that are marked
    as remotes (not members).

    Change-Id: I3f2026805413504c875ea5f39eb036d44b26dd43
    Depends-On: Iaa2251708616e9c69817bf5b346d795ea7a4d21b
    Closes-Bug: #1728527
    (cherry picked from commit dc9b77772417c99368a4bbe243bb8e0e7c0bca47)

tags: added: in-stable-train
Revision history for this message
Larry Lile (llile) wrote :

Can this merged back to Train please?

Thanks.

Revision history for this message
James Page (james-page) wrote :

@llile This fix is already in the packaging for OpenStack Train for Ubuntu

Revision history for this message
Larry Lile (llile) wrote :

@james-page I'm working with CentOS 7, the patch doesn't appear in the latest (8.0.0) masakari-monitors release.

Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

@llile That is an issue to talk to CentOS about, as James is talking about the Ubuntu releases

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on masakari-monitors (stable/queens)

Change abandoned by Radosław Piliszek (<email address hidden>) on branch: stable/queens
Review: https://review.opendev.org/681965
Reason: very old; branch long unmaintained

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

FWIW, there is now the "in_ccm" attribute on both kinds of nodes which would help. I have to revisit this in the Xena cycle.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.