Merge ~lucaskanashiro/ubuntu/+source/pacemaker:lp1896223-focal into ubuntu/+source/pacemaker:ubuntu/focal-devel

Proposed by Lucas Kanashiro
Status: Merged
Approved by: Lucas Kanashiro
Approved revision: aeae35a3b6001c04dab691a33ebac179e6fb4b49
Merged at revision: aeae35a3b6001c04dab691a33ebac179e6fb4b49
Proposed branch: ~lucaskanashiro/ubuntu/+source/pacemaker:lp1896223-focal
Merge into: ubuntu/+source/pacemaker:ubuntu/focal-devel
Diff against target: 1293 lines (+1199/-0)
15 files modified
debian/changelog (+21/-0)
debian/patches/series (+16/-0)
debian/patches/ubuntu-2.0.3-demote/lp1896223-01-f1f71b3-Refactor-scheduler-functionize-comparing-on-fail.patch (+181/-0)
debian/patches/ubuntu-2.0.3-demote/lp1896223-02-ef246ff-Fix-scheduler-disallow-on-fail-stop-for-stop.patch (+53/-0)
debian/patches/ubuntu-2.0.3-demote/lp1896223-03-8dceba7-Refactor-scheduler-use-more-appropriate-types.patch (+43/-0)
debian/patches/ubuntu-2.0.3-demote/lp1896223-04-a4d6a20-Low-libpacemaker-don-t-force-stop-when-skipping.patch (+45/-0)
debian/patches/ubuntu-2.0.3-demote/lp1896223-05-98c3b64-Log-libpacemaker-check-for-re-promotes-specifically.patch (+46/-0)
debian/patches/ubuntu-2.0.3-demote/lp1896223-06-2f1e2df-Feature-xml-add-on-fail-demote-option-to-resources.patch (+38/-0)
debian/patches/ubuntu-2.0.3-demote/lp1896223-07-874f75e-Feature-scheduler-new-on-fail-demote-recovery-policy.patch (+355/-0)
debian/patches/ubuntu-2.0.3-demote/lp1896223-08-7eec572-Build-libcrmcommon-bump-CRM-feature-set.patch (+51/-0)
debian/patches/ubuntu-2.0.3-demote/lp1896223-09-204961e-Doc-Pacemaker-Explained-document-new-on-fail.patch (+67/-0)
debian/patches/ubuntu-2.0.3-demote/lp1896223-10-015b5c0-Doc-Pacemaker-Explained-document-no-quorum.patch (+26/-0)
debian/patches/ubuntu-2.0.3-demote/lp1896223-11-0b68344-Refactor-scheduler-functionize-checking-quorum.patch (+61/-0)
debian/patches/ubuntu-2.0.3-demote/lp1896223-12-b1ae359-Feature-scheduler-support-demote-choice-for.patch (+163/-0)
debian/patches/ubuntu-2.0.3-demote/lp1896223-13-d4b9117-Doc-Pacemaker-Explained-correct-on-fail-default.patch (+33/-0)
Reviewer Review Type Date Requested Status
Christian Ehrhardt  (community) Approve
Canonical Server Pending
Review via email: mp+395032@code.launchpad.net

Description of the change

Backport the no-quorum-policy=demote and on-fail=demote features for MSSQL servers. It was previously proposed by Rafael and reviewed by Christian here:

https://code.launchpad.net/~rafaeldtinoco/ubuntu/+source/pacemaker/+git/pacemaker/+merge/392511

There was a security update in the meantime and I needed to rebase the changes.

I still want to run the regression tests before moving forward but a review for the packaging work is appreciated :)

To post a comment you must log in.
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The content of the MP has changed a lot (some patch headers, a whitespace change).
Seems like a re-export from git with other git options after a rebase, but mostly nothing important - yet a lot to re-digest :-)

---

There are also other changes like crm_config_err -> pcmk__config_err, action_t -> pe_action_t, and so on

I re-reviewed these compared to the changes that formerly went in.
The patches still apply and didn't semantically change (mostly varibable names and such context updates).

Well, the code is complex and "applying" isn't everything.
Could be all good, or we have just lost all that we formerly had identified on backport.
Because the changes closely match the former backport notes.

Let me ask one question as an example - and depending on the outcome I'd ask to regenerate ALL patches with that in mind - or I'm happy as is and we can go on.
File: lp1896223-02-ef246ff-Fix-scheduler-disallow-on-fail-stop-for-stop.patch
Rafael: crm_config_err
Lucas: pcmk__config_err
But Lucas File still says at the top:
  7 [Backport]
  8
  9 This pacemaker version did not use pcmk__config_err() function for
 10 configuration warnings. It used crm_config_err() still.
So either update all the patch headers to have backport-notes that match the content.
OR I've found changes that got unintentionally changed - then please fix these.

I have checked the example above for pcmk__config_err, but it isn't touched in the patches that went in since the last merge by Rafel. So one of you must be wrong I guess?

---

One more thing that remains as a todo:
"One bit, on SRUs I usually proof-read the SRU details on the bugs description before we go to the SRU team to be denied. There is nothing there yet, please make sure to add it before uploading."

That I asked on the old MP and I'd still ask for it - yet your SRU experience is good and you won't make a bad bug description (so I don't need to review it) - just keep in mind that you need to be complete before uploading.

---

TL;DR:
- Please explain if the patch differences were intentional (or fix them up if not)
- please add SRU description before upload
- please complete the tests before upload

review: Needs Fixing
Revision history for this message
Lucas Kanashiro (lucaskanashiro) wrote :

You are right, I wrongly resolved the conflicts while I was cherry-picking the upstream commits. The Rafael's solution is the right one. I am going to redo everything to make sure the content of the patches are the same as Rafael's patches.

About the extra/missing lines in the patch headers might be the tooling I use to manage patches, I use git-buildpackage patch queue. I do not think this is a big deal.

The SRU bug description and the regression tests are the next steps, they are already in my todo list.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Now things LGTM - thanks.
The need for SRU-template addition is known.

+1 for this once the tests have completed.

review: Approve
Revision history for this message
Lucas Kanashiro (lucaskanashiro) wrote :

I imported the version 2.0.3-3ubuntu4.2 to this branch which was released after I was working on this. The only change I made was setting the version to 2.0.3-3ubuntu4.3, all the rest is the same. So I am still considering it as approved based on the last Christian's comment.

FWIW MS folks tested the package and said it is working as they were expecting. I'll be uploading the package.

Revision history for this message
Lucas Kanashiro (lucaskanashiro) wrote :

Uploaded:

$ git push pkg upload/2.0.3-3ubuntu4.3
Enumerating objects: 29, done.
Counting objects: 100% (29/29), done.
Delta compression using up to 32 threads
Compressing objects: 100% (24/24), done.
Writing objects: 100% (24/24), 19.60 KiB | 1.78 MiB/s, done.
Total 24 (delta 8), reused 0 (delta 0)
To ssh://git.launchpad.net/ubuntu/+source/pacemaker
 * [new tag] upload/2.0.3-3ubuntu4.3 -> upload/2.0.3-3ubuntu4.3
$ dput ubuntu ../pacemaker_2.0.3-3ubuntu4.3_source.changes
Checking signature on .changes
gpg: ../pacemaker_2.0.3-3ubuntu4.3_source.changes: Valid signature from F823A2729883C97C
Checking signature on .dsc
gpg: ../pacemaker_2.0.3-3ubuntu4.3.dsc: Valid signature from F823A2729883C97C
Uploading to ubuntu (via ftp to upload.ubuntu.com):
  Uploading pacemaker_2.0.3-3ubuntu4.3.dsc: done.
  Uploading pacemaker_2.0.3-3ubuntu4.3.debian.tar.xz: done.
  Uploading pacemaker_2.0.3-3ubuntu4.3_source.changes: done.
Successfully uploaded packages.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
diff --git a/debian/changelog b/debian/changelog
index e7ec315..c4ffd80 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,24 @@
1pacemaker (2.0.3-3ubuntu4.3) focal; urgency=medium
2
3 [ Rafael David Tinoco ]
4 * Post 2.0.3 features: on-fail=demote & no-quorum-policy=demote
5 (LP: #1896223). Added debian/patches/ubuntu-2.0.3-demote/*:
6 - lp1896223-01-f1f71b3-Refactor-scheduler-functionize-comparing-on-fail.patch
7 - lp1896223-02-ef246ff-Fix-scheduler-disallow-on-fail-stop-for-stop.patch
8 - lp1896223-03-8dceba7-Refactor-scheduler-use-more-appropriate-types.patch
9 - lp1896223-04-a4d6a20-Low-libpacemaker-don-t-force-stop-when-skipping.patch
10 - lp1896223-05-98c3b64-Log-libpacemaker-check-for-re-promotes-specifically.patch
11 - lp1896223-06-2f1e2df-Feature-xml-add-on-fail-demote-option-to-resources.patch
12 - lp1896223-07-874f75e-Feature-scheduler-new-on-fail-demote-recovery-policy.patch
13 - lp1896223-08-7eec572-Build-libcrmcommon-bump-CRM-feature-set.patch
14 - lp1896223-09-204961e-Doc-Pacemaker-Explained-document-new-on-fail.patch
15 - lp1896223-10-015b5c0-Doc-Pacemaker-Explained-document-no-quorum.patch
16 - lp1896223-11-0b68344-Refactor-scheduler-functionize-checking-quorum.patch
17 - lp1896223-12-b1ae359-Feature-scheduler-support-demote-choice-for.patch
18 - lp1896223-13-d4b9117-Doc-Pacemaker-Explained-correct-on-fail-default.patch
19
20 -- Lucas Kanashiro <kanashiro@ubuntu.com> Wed, 09 Dec 2020 10:27:00 -0300
21
1pacemaker (2.0.3-3ubuntu4.2) focal; urgency=medium22pacemaker (2.0.3-3ubuntu4.2) focal; urgency=medium
223
3 * d/rules: Rebuild with QB_KILL_ATTRIBUTE_SECTION to overcome a problem in24 * d/rules: Rebuild with QB_KILL_ATTRIBUTE_SECTION to overcome a problem in
diff --git a/debian/patches/series b/debian/patches/series
index c02030b..944751a 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -32,3 +32,19 @@ CVE-2020-25654-4.patch
32CVE-2020-25654-5.patch32CVE-2020-25654-5.patch
33CVE-2020-25654-6.patch33CVE-2020-25654-6.patch
34CVE-2020-25654-7.patch34CVE-2020-25654-7.patch
35#
36# https://bugs.launchpad.net/bugs/1896223
37#
38ubuntu-2.0.3-demote/lp1896223-01-f1f71b3-Refactor-scheduler-functionize-comparing-on-fail.patch
39ubuntu-2.0.3-demote/lp1896223-02-ef246ff-Fix-scheduler-disallow-on-fail-stop-for-stop.patch
40ubuntu-2.0.3-demote/lp1896223-03-8dceba7-Refactor-scheduler-use-more-appropriate-types.patch
41ubuntu-2.0.3-demote/lp1896223-04-a4d6a20-Low-libpacemaker-don-t-force-stop-when-skipping.patch
42ubuntu-2.0.3-demote/lp1896223-05-98c3b64-Log-libpacemaker-check-for-re-promotes-specifically.patch
43ubuntu-2.0.3-demote/lp1896223-06-2f1e2df-Feature-xml-add-on-fail-demote-option-to-resources.patch
44ubuntu-2.0.3-demote/lp1896223-07-874f75e-Feature-scheduler-new-on-fail-demote-recovery-policy.patch
45ubuntu-2.0.3-demote/lp1896223-08-7eec572-Build-libcrmcommon-bump-CRM-feature-set.patch
46ubuntu-2.0.3-demote/lp1896223-09-204961e-Doc-Pacemaker-Explained-document-new-on-fail.patch
47ubuntu-2.0.3-demote/lp1896223-10-015b5c0-Doc-Pacemaker-Explained-document-no-quorum.patch
48ubuntu-2.0.3-demote/lp1896223-11-0b68344-Refactor-scheduler-functionize-checking-quorum.patch
49ubuntu-2.0.3-demote/lp1896223-12-b1ae359-Feature-scheduler-support-demote-choice-for.patch
50ubuntu-2.0.3-demote/lp1896223-13-d4b9117-Doc-Pacemaker-Explained-correct-on-fail-default.patch
diff --git a/debian/patches/ubuntu-2.0.3-demote/lp1896223-01-f1f71b3-Refactor-scheduler-functionize-comparing-on-fail.patch b/debian/patches/ubuntu-2.0.3-demote/lp1896223-01-f1f71b3-Refactor-scheduler-functionize-comparing-on-fail.patch
35new file mode 10064451new file mode 100644
index 0000000..796101c
--- /dev/null
+++ b/debian/patches/ubuntu-2.0.3-demote/lp1896223-01-f1f71b3-Refactor-scheduler-functionize-comparing-on-fail.patch
@@ -0,0 +1,181 @@
1From: Ken Gaillot <kgaillot@redhat.com>
2Date: Thu, 28 May 2020 08:22:00 -0500
3Subject: Refactor: scheduler: functionize comparing on-fail values
4
5The action_fail_response enum values used for the "on-fail" operation
6meta-attribute were initially intended to be in order of severity.
7However as new values were added, they were added to the end (out of severity
8order) to preserve API backward compatibility.
9
10This resulted in a convoluted comparison of values that will only get worse as
11more values are added.
12
13This commit adds a comparison function to isolate that complexity.
14
15Author: Ken Gaillot <kgaillot@redhat.com>
16Origin: upstream, https://github.com/ClusterLabs/pacemaker/commit/f1f71b3
17Bug-Ubuntu: https://bugs.launchpad.net/bugs/1896223
18Reviewed-by: Rafael David Tinoco <rafaeldtinoco@ubuntu.com>
19Last-Update: 2020-10-05
20---
21 include/crm/pengine/common.h | 32 ++++++++++++------
22 lib/pengine/unpack.c | 80 +++++++++++++++++++++++++++++++++++++++++---
23 2 files changed, 97 insertions(+), 15 deletions(-)
24
25diff --git a/include/crm/pengine/common.h b/include/crm/pengine/common.h
26index e497f9c..450206e 100644
27--- a/include/crm/pengine/common.h
28+++ b/include/crm/pengine/common.h
29@@ -29,18 +29,29 @@ extern "C" {
30 extern gboolean was_processing_error;
31 extern gboolean was_processing_warning;
32
33-/* order is significant here
34- * items listed in order of accending severeness
35- * more severe actions take precedent over lower ones
36+/* The order is (partially) significant here; the values from action_fail_ignore
37+ * through action_fail_fence are in order of increasing severity.
38+ *
39+ * @COMPAT The values should be ordered and numbered per the "TODO" comments
40+ * below, so all values are in order of severity and there is room for
41+ * future additions, but that would break API compatibility.
42+ * @TODO For now, we just use a function to compare the values specially, but
43+ * at the next compatibility break, we should arrange things properly.
44 */
45 enum action_fail_response {
46- action_fail_ignore,
47- action_fail_recover,
48- action_fail_migrate, /* recover by moving it somewhere else */
49- action_fail_block,
50- action_fail_stop,
51- action_fail_standby,
52- action_fail_fence,
53+ action_fail_ignore, // @TODO = 10
54+ // @TODO action_fail_demote = 20,
55+ action_fail_recover, // @TODO = 30
56+ // @TODO action_fail_reset_remote = 40,
57+ // @TODO action_fail_restart_container = 50,
58+ action_fail_migrate, // @TODO = 60
59+ action_fail_block, // @TODO = 70
60+ action_fail_stop, // @TODO = 80
61+ action_fail_standby, // @TODO = 90
62+ action_fail_fence, // @TODO = 100
63+
64+ // @COMPAT Values below here are out of order for API compatibility
65+
66 action_fail_restart_container,
67
68 /* This is reserved for internal use for remote node connection resources.
69@@ -51,6 +62,7 @@ enum action_fail_response {
70 */
71 action_fail_reset_remote,
72
73+ action_fail_demote,
74 };
75
76 /* the "done" action must be the "pre" action +1 */
77diff --git a/lib/pengine/unpack.c b/lib/pengine/unpack.c
78index d337758..514207e 100644
79--- a/lib/pengine/unpack.c
80+++ b/lib/pengine/unpack.c
81@@ -2724,6 +2724,78 @@ last_change_str(xmlNode *xml_op)
82 return ((when_s && *when_s)? when_s : "unknown time");
83 }
84
85+/*!
86+ * \internal
87+ * \brief Compare two on-fail values
88+ *
89+ * \param[in] first One on-fail value to compare
90+ * \param[in] second The other on-fail value to compare
91+ *
92+ * \return A negative number if second is more severe than first, zero if they
93+ * are equal, or a positive number if first is more severe than second.
94+ * \note This is only needed until the action_fail_response values can be
95+ * renumbered at the next API compatibility break.
96+ */
97+static int
98+cmp_on_fail(enum action_fail_response first, enum action_fail_response second)
99+{
100+ switch (first) {
101+ case action_fail_reset_remote:
102+ switch (second) {
103+ case action_fail_ignore:
104+ case action_fail_recover:
105+ return 1;
106+ case action_fail_reset_remote:
107+ return 0;
108+ default:
109+ return -1;
110+ }
111+ break;
112+
113+ case action_fail_restart_container:
114+ switch (second) {
115+ case action_fail_ignore:
116+ case action_fail_recover:
117+ case action_fail_reset_remote:
118+ return 1;
119+ case action_fail_restart_container:
120+ return 0;
121+ default:
122+ return -1;
123+ }
124+ break;
125+
126+ default:
127+ break;
128+ }
129+ switch (second) {
130+ case action_fail_reset_remote:
131+ switch (first) {
132+ case action_fail_ignore:
133+ case action_fail_recover:
134+ return -1;
135+ default:
136+ return 1;
137+ }
138+ break;
139+
140+ case action_fail_restart_container:
141+ switch (first) {
142+ case action_fail_ignore:
143+ case action_fail_recover:
144+ case action_fail_reset_remote:
145+ return -1;
146+ default:
147+ return 1;
148+ }
149+ break;
150+
151+ default:
152+ break;
153+ }
154+ return first - second;
155+}
156+
157 static void
158 unpack_rsc_op_failure(resource_t * rsc, node_t * node, int rc, xmlNode * xml_op, xmlNode ** last_failure,
159 enum action_fail_response * on_fail, pe_working_set_t * data_set)
160@@ -2783,10 +2855,7 @@ unpack_rsc_op_failure(resource_t * rsc, node_t * node, int rc, xmlNode * xml_op,
161 }
162
163 action = custom_action(rsc, strdup(key), task, NULL, TRUE, FALSE, data_set);
164- if ((action->on_fail <= action_fail_fence && *on_fail < action->on_fail) ||
165- (action->on_fail == action_fail_reset_remote && *on_fail <= action_fail_recover) ||
166- (action->on_fail == action_fail_restart_container && *on_fail <= action_fail_recover) ||
167- (*on_fail == action_fail_restart_container && action->on_fail >= action_fail_migrate)) {
168+ if (cmp_on_fail(*on_fail, action->on_fail) < 0) {
169 pe_rsc_trace(rsc, "on-fail %s -> %s for %s (%s)", fail2text(*on_fail),
170 fail2text(action->on_fail), action->uuid, key);
171 *on_fail = action->on_fail;
172@@ -3630,7 +3699,8 @@ unpack_rsc_op(pe_resource_t *rsc, pe_node_t *node, xmlNode *xml_op,
173
174 record_failed_op(xml_op, node, rsc, data_set);
175
176- if (failure_strategy == action_fail_restart_container && *on_fail <= action_fail_recover) {
177+ if ((failure_strategy == action_fail_restart_container)
178+ && cmp_on_fail(*on_fail, action_fail_recover) <= 0) {
179 *on_fail = failure_strategy;
180 }
181
diff --git a/debian/patches/ubuntu-2.0.3-demote/lp1896223-02-ef246ff-Fix-scheduler-disallow-on-fail-stop-for-stop.patch b/debian/patches/ubuntu-2.0.3-demote/lp1896223-02-ef246ff-Fix-scheduler-disallow-on-fail-stop-for-stop.patch
0new file mode 100644182new file mode 100644
index 0000000..72c7abc
--- /dev/null
+++ b/debian/patches/ubuntu-2.0.3-demote/lp1896223-02-ef246ff-Fix-scheduler-disallow-on-fail-stop-for-stop.patch
@@ -0,0 +1,53 @@
1From: Ken Gaillot <kgaillot@redhat.com>
2Date: Thu, 28 May 2020 08:27:47 -0500
3Subject: Fix: scheduler: disallow on-fail=stop for stop operations
4
5because it would loop infinitely as long as the stop continued to fail
6
7 [Backport]
8
9 This pacemaker version did not use pcmk__config_err() function for
10 configuration warnings. It used crm_config_err() still.
11
12Signed-off-by: Rafael David Tinoco <rafaeldtinoco@ubuntu.com>
13
14Author: Ken Gaillot <kgaillot@redhat.com>
15Origin: backport, https://github.com/ClusterLabs/pacemaker/commit/ef246ff
16Bug-Ubuntu: https://bugs.launchpad.net/bugs/1896223
17Reviewed-by: Rafael David Tinoco <rafaeldtinoco@ubuntu.com>
18Last-Update: 2020-10-05
19---
20 lib/pengine/utils.c | 14 ++++++++++++--
21 1 file changed, 12 insertions(+), 2 deletions(-)
22
23diff --git a/lib/pengine/utils.c b/lib/pengine/utils.c
24index 3fc072f..ad5a09b 100644
25--- a/lib/pengine/utils.c
26+++ b/lib/pengine/utils.c
27@@ -666,14 +666,24 @@ custom_action(resource_t * rsc, char *key, const char *task,
28 return action;
29 }
30
31+static bool
32+valid_stop_on_fail(const char *value)
33+{
34+ return safe_str_neq(value, "standby")
35+ && safe_str_neq(value, "stop");
36+}
37+
38 static const char *
39 unpack_operation_on_fail(action_t * action)
40 {
41
42 const char *value = g_hash_table_lookup(action->meta, XML_OP_ATTR_ON_FAIL);
43
44- if (safe_str_eq(action->task, CRMD_ACTION_STOP) && safe_str_eq(value, "standby")) {
45- crm_config_err("on-fail=standby is not allowed for stop actions: %s", action->rsc->id);
46+ if (safe_str_eq(action->task, CRMD_ACTION_STOP)
47+ && !valid_stop_on_fail(value)) {
48+ crm_config_err("Resetting '" XML_OP_ATTR_ON_FAIL "' for %s stop "
49+ "action to default value because '%s' is not "
50+ "allowed for stop", action->rsc->id, value);
51 return NULL;
52 } else if (safe_str_eq(action->task, CRMD_ACTION_DEMOTE) && !value) {
53 /* demote on_fail defaults to master monitor value if present */
diff --git a/debian/patches/ubuntu-2.0.3-demote/lp1896223-03-8dceba7-Refactor-scheduler-use-more-appropriate-types.patch b/debian/patches/ubuntu-2.0.3-demote/lp1896223-03-8dceba7-Refactor-scheduler-use-more-appropriate-types.patch
0new file mode 10064454new file mode 100644
index 0000000..303d874
--- /dev/null
+++ b/debian/patches/ubuntu-2.0.3-demote/lp1896223-03-8dceba7-Refactor-scheduler-use-more-appropriate-types.patch
@@ -0,0 +1,43 @@
1From: Ken Gaillot <kgaillot@redhat.com>
2Date: Thu, 28 May 2020 08:50:33 -0500
3Subject: Refactor: scheduler: use more appropriate types in a couple places
4
5Author: Ken Gaillot <kgaillot@redhat.com>
6Origin: upstream, https://github.com/ClusterLabs/pacemaker/commit/8dceba7
7Bug-Ubuntu: https://bugs.launchpad.net/bugs/1896223
8Reviewed-by: Rafael David Tinoco <rafaeldtinoco@ubuntu.com>
9Last-Update: 2020-10-05
10---
11 lib/pengine/unpack.c | 5 ++---
12 1 file changed, 2 insertions(+), 3 deletions(-)
13
14diff --git a/lib/pengine/unpack.c b/lib/pengine/unpack.c
15index 514207e..8f1ac1b 100644
16--- a/lib/pengine/unpack.c
17+++ b/lib/pengine/unpack.c
18@@ -2206,7 +2206,7 @@ unpack_lrm_rsc_state(node_t * node, xmlNode * rsc_entry, pe_working_set_t * data
19 xmlNode *rsc_op = NULL;
20 xmlNode *last_failure = NULL;
21
22- enum action_fail_response on_fail = FALSE;
23+ enum action_fail_response on_fail = action_fail_ignore;
24 enum rsc_role_e saved_role = RSC_ROLE_UNKNOWN;
25
26 crm_trace("[%s] Processing %s on %s",
27@@ -2237,7 +2237,6 @@ unpack_lrm_rsc_state(node_t * node, xmlNode * rsc_entry, pe_working_set_t * data
28
29 /* process operations */
30 saved_role = rsc->role;
31- on_fail = action_fail_ignore;
32 rsc->role = RSC_ROLE_UNKNOWN;
33 sorted_op_list = g_list_sort(op_list, sort_op_by_callid);
34
35@@ -3331,7 +3330,7 @@ int pe__target_rc_from_xml(xmlNode *xml_op)
36 static enum action_fail_response
37 get_action_on_fail(resource_t *rsc, const char *key, const char *task, pe_working_set_t * data_set)
38 {
39- int result = action_fail_recover;
40+ enum action_fail_response result = action_fail_recover;
41 action_t *action = custom_action(rsc, strdup(key), task, NULL, TRUE, FALSE, data_set);
42
43 result = action->on_fail;
diff --git a/debian/patches/ubuntu-2.0.3-demote/lp1896223-04-a4d6a20-Low-libpacemaker-don-t-force-stop-when-skipping.patch b/debian/patches/ubuntu-2.0.3-demote/lp1896223-04-a4d6a20-Low-libpacemaker-don-t-force-stop-when-skipping.patch
0new file mode 10064444new file mode 100644
index 0000000..baad36a
--- /dev/null
+++ b/debian/patches/ubuntu-2.0.3-demote/lp1896223-04-a4d6a20-Low-libpacemaker-don-t-force-stop-when-skipping.patch
@@ -0,0 +1,45 @@
1From: Ken Gaillot <kgaillot@redhat.com>
2Date: Tue, 2 Jun 2020 12:05:57 -0500
3Subject: Low: libpacemaker: don't force stop when skipping reload of failed
4 resource
5
6Normal failure recovery will apply, which will stop if needed.
7
8(The stop was forced as of 2558d76f.)
9
10Author: Ken Gaillot <kgaillot@redhat.com>
11Origin: upstream, https://github.com/ClusterLabs/pacemaker/commit/a4d6a20
12Bug-Ubuntu: https://bugs.launchpad.net/bugs/1896223
13Reviewed-by: Rafael David Tinoco <rafaeldtinoco@ubuntu.com>
14Last-Update: 2020-10-05
15---
16 lib/pacemaker/pcmk_sched_native.c | 16 +++++++++++++---
17 1 file changed, 13 insertions(+), 3 deletions(-)
18
19diff --git a/lib/pacemaker/pcmk_sched_native.c b/lib/pacemaker/pcmk_sched_native.c
20index bbf3eb7..04552c4 100644
21--- a/lib/pacemaker/pcmk_sched_native.c
22+++ b/lib/pacemaker/pcmk_sched_native.c
23@@ -3270,9 +3270,19 @@ ReloadRsc(resource_t * rsc, node_t *node, pe_working_set_t * data_set)
24 pe_rsc_trace(rsc, "%s: unmanaged", rsc->id);
25 return;
26
27- } else if (is_set(rsc->flags, pe_rsc_failed) || is_set(rsc->flags, pe_rsc_start_pending)) {
28- pe_rsc_trace(rsc, "%s: general resource state: flags=0x%.16llx", rsc->id, rsc->flags);
29- stop_action(rsc, node, FALSE); /* Force a full restart, overkill? */
30+ } else if (is_set(rsc->flags, pe_rsc_failed)) {
31+ /* We don't need to specify any particular actions here, normal failure
32+ * recovery will apply.
33+ */
34+ pe_rsc_trace(rsc, "%s: preventing reload because failed", rsc->id);
35+ return;
36+
37+ } else if (is_set(rsc->flags, pe_rsc_start_pending)) {
38+ /* If a resource's configuration changed while a start was pending,
39+ * force a full restart.
40+ */
41+ pe_rsc_trace(rsc, "%s: preventing reload because start pending", rsc->id);
42+ stop_action(rsc, node, FALSE);
43 return;
44
45 } else if (node == NULL) {
diff --git a/debian/patches/ubuntu-2.0.3-demote/lp1896223-05-98c3b64-Log-libpacemaker-check-for-re-promotes-specifically.patch b/debian/patches/ubuntu-2.0.3-demote/lp1896223-05-98c3b64-Log-libpacemaker-check-for-re-promotes-specifically.patch
0new file mode 10064446new file mode 100644
index 0000000..7620a7c
--- /dev/null
+++ b/debian/patches/ubuntu-2.0.3-demote/lp1896223-05-98c3b64-Log-libpacemaker-check-for-re-promotes-specifically.patch
@@ -0,0 +1,46 @@
1From: Ken Gaillot <kgaillot@redhat.com>
2Date: Mon, 13 Apr 2020 12:23:22 -0500
3Subject: Log: libpacemaker: check for re-promotes specifically
4
5If a promotable clone instance is being demoted and promoted on its current
6node, without also stopping and starting, it previously would be logged as
7"Leave" indicating unchanged, because the current and next role are the same.
8
9Now, check for this situation specifically, and log it as "Re-promote".
10
11Currently, the scheduler is not capable of generating this situation, but
12upcoming changes will.
13
14Author: Ken Gaillot <kgaillot@redhat.com>
15Origin: upstream, https://github.com/ClusterLabs/pacemaker/commit/98c3b64
16Bug-Ubuntu: https://bugs.launchpad.net/bugs/1896223
17Reviewed-by: Rafael David Tinoco <rafaeldtinoco@ubuntu.com>
18Last-Update: 2020-10-05
19---
20 lib/pacemaker/pcmk_sched_native.c | 12 ++++++++++--
21 1 file changed, 10 insertions(+), 2 deletions(-)
22
23diff --git a/lib/pacemaker/pcmk_sched_native.c b/lib/pacemaker/pcmk_sched_native.c
24index 04552c4..7c193fa 100644
25--- a/lib/pacemaker/pcmk_sched_native.c
26+++ b/lib/pacemaker/pcmk_sched_native.c
27@@ -2466,9 +2466,17 @@ LogActions(resource_t * rsc, pe_working_set_t * data_set, gboolean terminal)
28 } else if (is_set(rsc->flags, pe_rsc_reload)) {
29 LogAction("Reload", rsc, current, next, start, NULL, terminal);
30
31+
32 } else if (start == NULL || is_set(start->flags, pe_action_optional)) {
33- pe_rsc_info(rsc, "Leave %s\t(%s %s)", rsc->id, role2text(rsc->role),
34- next->details->uname);
35+ if ((demote != NULL) && (promote != NULL)
36+ && is_not_set(demote->flags, pe_action_optional)
37+ && is_not_set(promote->flags, pe_action_optional)) {
38+ LogAction("Re-promote", rsc, current, next, promote, demote,
39+ terminal);
40+ } else {
41+ pe_rsc_info(rsc, "Leave %s\t(%s %s)", rsc->id,
42+ role2text(rsc->role), next->details->uname);
43+ }
44
45 } else if (start && is_set(start->flags, pe_action_runnable) == FALSE) {
46 LogAction("Stop", rsc, current, NULL, stop,
diff --git a/debian/patches/ubuntu-2.0.3-demote/lp1896223-06-2f1e2df-Feature-xml-add-on-fail-demote-option-to-resources.patch b/debian/patches/ubuntu-2.0.3-demote/lp1896223-06-2f1e2df-Feature-xml-add-on-fail-demote-option-to-resources.patch
0new file mode 10064447new file mode 100644
index 0000000..d6aeb97
--- /dev/null
+++ b/debian/patches/ubuntu-2.0.3-demote/lp1896223-06-2f1e2df-Feature-xml-add-on-fail-demote-option-to-resources.patch
@@ -0,0 +1,38 @@
1From: Ken Gaillot <kgaillot@redhat.com>
2Date: Tue, 26 May 2020 17:50:48 -0500
3Subject: Feature: xml: add on-fail="demote" option to resources schema
4
5We don't need an XML schema version bump because it was already bumped since
6the last release, for the rsc_expression/op_expression feature.
7
8 [Backport]
9
10 Original patch changes xml/resources-3.4.rng. As we're backporting
11 features to Ubuntu 2.0.4 release, which only defines schema up to
12 xml/resources-3.2.rng, and using a new minor version only, for this
13 new Ubuntu only feature set (3.3.1), this patch adds the feature to
14 the 3.2 resources schema instead of a 3.4.
15
16Signed-off-by: Rafael David Tinoco <rafaeldtinoco@ubuntu.com>
17
18Author: Ken Gaillot <kgaillot@redhat.com>
19Origin: backport, https://github.com/ClusterLabs/pacemaker/commit/2f1e2df
20Bug-Ubuntu: https://bugs.launchpad.net/bugs/1896223
21Reviewed-by: Rafael David Tinoco <rafaeldtinoco@ubuntu.com>
22Last-Update: 2020-10-05
23---
24 xml/resources-3.2.rng | 1 +
25 1 file changed, 1 insertion(+)
26
27diff --git a/xml/resources-3.2.rng b/xml/resources-3.2.rng
28index 44656d6..1930508 100644
29--- a/xml/resources-3.2.rng
30+++ b/xml/resources-3.2.rng
31@@ -388,6 +388,7 @@
32 <choice>
33 <value>ignore</value>
34 <value>block</value>
35+ <value>demote</value>
36 <value>stop</value>
37 <value>restart</value>
38 <value>standby</value>
diff --git a/debian/patches/ubuntu-2.0.3-demote/lp1896223-07-874f75e-Feature-scheduler-new-on-fail-demote-recovery-policy.patch b/debian/patches/ubuntu-2.0.3-demote/lp1896223-07-874f75e-Feature-scheduler-new-on-fail-demote-recovery-policy.patch
0new file mode 10064439new file mode 100644
index 0000000..ae96daf
--- /dev/null
+++ b/debian/patches/ubuntu-2.0.3-demote/lp1896223-07-874f75e-Feature-scheduler-new-on-fail-demote-recovery-policy.patch
@@ -0,0 +1,355 @@
1From: Ken Gaillot <kgaillot@redhat.com>
2Date: Thu, 28 May 2020 08:29:37 -0500
3Subject: Feature: scheduler: new on-fail="demote" recovery policy for
4 promoted resources
5
6Author: Ken Gaillot <kgaillot@redhat.com>
7Origin: upstream, https://github.com/ClusterLabs/pacemaker/commit/874f75e
8Bug-Ubuntu: https://bugs.launchpad.net/bugs/1896223
9Reviewed-by: Rafael David Tinoco <rafaeldtinoco@ubuntu.com>
10Last-Update: 2020-10-05
11---
12 include/crm/pengine/pe_types.h | 1 +
13 lib/pacemaker/pcmk_sched_native.c | 25 +++++++++++++++----
14 lib/pengine/common.c | 3 +++
15 lib/pengine/unpack.c | 51 ++++++++++++++++++++++++++++++++++++---
16 lib/pengine/utils.c | 34 ++++++++++++++++++++++----
17 5 files changed, 101 insertions(+), 13 deletions(-)
18
19diff --git a/include/crm/pengine/pe_types.h b/include/crm/pengine/pe_types.h
20index 23e1c46..6e5cbcc 100644
21--- a/include/crm/pengine/pe_types.h
22+++ b/include/crm/pengine/pe_types.h
23@@ -235,6 +235,7 @@ struct pe_node_s {
24 # define pe_rsc_allocating 0x00000200ULL
25 # define pe_rsc_merging 0x00000400ULL
26
27+# define pe_rsc_stop 0x00001000ULL
28 # define pe_rsc_reload 0x00002000ULL
29 # define pe_rsc_allow_remote_remotes 0x00004000ULL
30
31diff --git a/lib/pacemaker/pcmk_sched_native.c b/lib/pacemaker/pcmk_sched_native.c
32index 7c193fa..3ce75b8 100644
33--- a/lib/pacemaker/pcmk_sched_native.c
34+++ b/lib/pacemaker/pcmk_sched_native.c
35@@ -1122,6 +1122,7 @@ native_create_actions(resource_t * rsc, pe_working_set_t * data_set)
36 node_t *chosen = NULL;
37 node_t *current = NULL;
38 gboolean need_stop = FALSE;
39+ bool need_promote = FALSE;
40 gboolean is_moving = FALSE;
41 gboolean allow_migrate = is_set(rsc->flags, pe_rsc_allow_migrate) ? TRUE : FALSE;
42
43@@ -1226,8 +1227,15 @@ native_create_actions(resource_t * rsc, pe_working_set_t * data_set)
44 need_stop = TRUE;
45
46 } else if (is_set(rsc->flags, pe_rsc_failed)) {
47- pe_rsc_trace(rsc, "Recovering %s", rsc->id);
48- need_stop = TRUE;
49+ if (is_set(rsc->flags, pe_rsc_stop)) {
50+ need_stop = TRUE;
51+ pe_rsc_trace(rsc, "Recovering %s", rsc->id);
52+ } else {
53+ pe_rsc_trace(rsc, "Recovering %s by demotion", rsc->id);
54+ if (rsc->next_role == RSC_ROLE_MASTER) {
55+ need_promote = TRUE;
56+ }
57+ }
58
59 } else if (is_set(rsc->flags, pe_rsc_block)) {
60 pe_rsc_trace(rsc, "Block %s", rsc->id);
61@@ -1261,10 +1269,16 @@ native_create_actions(resource_t * rsc, pe_working_set_t * data_set)
62
63
64 while (rsc->role <= rsc->next_role && role != rsc->role && is_not_set(rsc->flags, pe_rsc_block)) {
65+ bool required = need_stop;
66+
67 next_role = rsc_state_matrix[role][rsc->role];
68+ if ((next_role == RSC_ROLE_MASTER) && need_promote) {
69+ required = true;
70+ }
71 pe_rsc_trace(rsc, "Up: Executing: %s->%s (%s)%s", role2text(role), role2text(next_role),
72- rsc->id, need_stop ? " required" : "");
73- if (rsc_action_matrix[role][next_role] (rsc, chosen, !need_stop, data_set) == FALSE) {
74+ rsc->id, (required? " required" : ""));
75+ if (rsc_action_matrix[role][next_role](rsc, chosen, !required,
76+ data_set) == FALSE) {
77 break;
78 }
79 role = next_role;
80@@ -2527,7 +2541,8 @@ LogActions(resource_t * rsc, pe_working_set_t * data_set, gboolean terminal)
81
82 free(key);
83
84- } else if (stop && is_set(rsc->flags, pe_rsc_failed)) {
85+ } else if (stop && is_set(rsc->flags, pe_rsc_failed)
86+ && is_set(rsc->flags, pe_rsc_stop)) {
87 /* 'stop' may be NULL if the failure was ignored */
88 LogAction("Recover", rsc, current, next, stop, start, terminal);
89 STOP_SANITY_ASSERT(__LINE__);
90diff --git a/lib/pengine/common.c b/lib/pengine/common.c
91index da39c99..fcd7cf0 100644
92--- a/lib/pengine/common.c
93+++ b/lib/pengine/common.c
94@@ -198,6 +198,9 @@ fail2text(enum action_fail_response fail)
95 case action_fail_ignore:
96 result = "ignore";
97 break;
98+ case action_fail_demote:
99+ result = "demote";
100+ break;
101 case action_fail_block:
102 result = "block";
103 break;
104diff --git a/lib/pengine/unpack.c b/lib/pengine/unpack.c
105index 8f1ac1b..e690c4e 100644
106--- a/lib/pengine/unpack.c
107+++ b/lib/pengine/unpack.c
108@@ -100,6 +100,7 @@ pe_fence_node(pe_working_set_t * data_set, node_t * node, const char *reason)
109 */
110 node->details->remote_requires_reset = TRUE;
111 set_bit(rsc->flags, pe_rsc_failed);
112+ set_bit(rsc->flags, pe_rsc_stop);
113 }
114 }
115
116@@ -109,6 +110,7 @@ pe_fence_node(pe_working_set_t * data_set, node_t * node, const char *reason)
117 "and guest resource no longer exists",
118 node->details->uname, reason);
119 set_bit(node->details->remote_rsc->flags, pe_rsc_failed);
120+ set_bit(node->details->remote_rsc->flags, pe_rsc_stop);
121
122 } else if (pe__is_remote_node(node)) {
123 resource_t *rsc = node->details->remote_rsc;
124@@ -1898,6 +1900,7 @@ process_rsc_state(resource_t * rsc, node_t * node,
125 */
126 if (pe__is_guest_node(node)) {
127 set_bit(rsc->flags, pe_rsc_failed);
128+ set_bit(rsc->flags, pe_rsc_stop);
129 should_fence = TRUE;
130
131 } else if (is_set(data_set->flags, pe_flag_stonith_enabled)) {
132@@ -1940,6 +1943,11 @@ process_rsc_state(resource_t * rsc, node_t * node,
133 /* nothing to do */
134 break;
135
136+ case action_fail_demote:
137+ set_bit(rsc->flags, pe_rsc_failed);
138+ demote_action(rsc, node, FALSE);
139+ break;
140+
141 case action_fail_fence:
142 /* treat it as if it is still running
143 * but also mark the node as unclean
144@@ -1976,12 +1984,14 @@ process_rsc_state(resource_t * rsc, node_t * node,
145 case action_fail_recover:
146 if (rsc->role != RSC_ROLE_STOPPED && rsc->role != RSC_ROLE_UNKNOWN) {
147 set_bit(rsc->flags, pe_rsc_failed);
148+ set_bit(rsc->flags, pe_rsc_stop);
149 stop_action(rsc, node, FALSE);
150 }
151 break;
152
153 case action_fail_restart_container:
154 set_bit(rsc->flags, pe_rsc_failed);
155+ set_bit(rsc->flags, pe_rsc_stop);
156
157 if (rsc->container && pe_rsc_is_bundled(rsc)) {
158 /* A bundle's remote connection can run on a different node than
159@@ -2000,6 +2010,7 @@ process_rsc_state(resource_t * rsc, node_t * node,
160
161 case action_fail_reset_remote:
162 set_bit(rsc->flags, pe_rsc_failed);
163+ set_bit(rsc->flags, pe_rsc_stop);
164 if (is_set(data_set->flags, pe_flag_stonith_enabled)) {
165 tmpnode = NULL;
166 if (rsc->is_remote_node) {
167@@ -2054,8 +2065,17 @@ process_rsc_state(resource_t * rsc, node_t * node,
168 }
169
170 native_add_running(rsc, node, data_set);
171- if (on_fail != action_fail_ignore) {
172- set_bit(rsc->flags, pe_rsc_failed);
173+ switch (on_fail) {
174+ case action_fail_ignore:
175+ break;
176+ case action_fail_demote:
177+ case action_fail_block:
178+ set_bit(rsc->flags, pe_rsc_failed);
179+ break;
180+ default:
181+ set_bit(rsc->flags, pe_rsc_failed);
182+ set_bit(rsc->flags, pe_rsc_stop);
183+ break;
184 }
185
186 } else if (rsc->clone_name && strchr(rsc->clone_name, ':') != NULL) {
187@@ -2549,6 +2569,7 @@ unpack_migrate_to_success(pe_resource_t *rsc, pe_node_t *node, xmlNode *xml_op,
188 } else {
189 /* Consider it failed here - forces a restart, prevents migration */
190 set_bit(rsc->flags, pe_rsc_failed);
191+ set_bit(rsc->flags, pe_rsc_stop);
192 clear_bit(rsc->flags, pe_rsc_allow_migrate);
193 }
194 }
195@@ -2739,9 +2760,21 @@ static int
196 cmp_on_fail(enum action_fail_response first, enum action_fail_response second)
197 {
198 switch (first) {
199+ case action_fail_demote:
200+ switch (second) {
201+ case action_fail_ignore:
202+ return 1;
203+ case action_fail_demote:
204+ return 0;
205+ default:
206+ return -1;
207+ }
208+ break;
209+
210 case action_fail_reset_remote:
211 switch (second) {
212 case action_fail_ignore:
213+ case action_fail_demote:
214 case action_fail_recover:
215 return 1;
216 case action_fail_reset_remote:
217@@ -2754,6 +2787,7 @@ cmp_on_fail(enum action_fail_response first, enum action_fail_response second)
218 case action_fail_restart_container:
219 switch (second) {
220 case action_fail_ignore:
221+ case action_fail_demote:
222 case action_fail_recover:
223 case action_fail_reset_remote:
224 return 1;
225@@ -2768,9 +2802,13 @@ cmp_on_fail(enum action_fail_response first, enum action_fail_response second)
226 break;
227 }
228 switch (second) {
229+ case action_fail_demote:
230+ return (first == action_fail_ignore)? -1 : 1;
231+
232 case action_fail_reset_remote:
233 switch (first) {
234 case action_fail_ignore:
235+ case action_fail_demote:
236 case action_fail_recover:
237 return -1;
238 default:
239@@ -2781,6 +2819,7 @@ cmp_on_fail(enum action_fail_response first, enum action_fail_response second)
240 case action_fail_restart_container:
241 switch (first) {
242 case action_fail_ignore:
243+ case action_fail_demote:
244 case action_fail_recover:
245 case action_fail_reset_remote:
246 return -1;
247@@ -3381,7 +3420,11 @@ update_resource_state(resource_t * rsc, node_t * node, xmlNode * xml_op, const c
248 clear_past_failure = TRUE;
249
250 } else if (safe_str_eq(task, CRMD_ACTION_DEMOTE)) {
251- /* Demote from Master does not clear an error */
252+
253+ if (*on_fail == action_fail_demote) {
254+ // Demote clears an error only if on-fail=demote
255+ clear_past_failure = TRUE;
256+ }
257 rsc->role = RSC_ROLE_SLAVE;
258
259 } else if (safe_str_eq(task, CRMD_ACTION_MIGRATED)) {
260@@ -3409,6 +3452,7 @@ update_resource_state(resource_t * rsc, node_t * node, xmlNode * xml_op, const c
261
262 case action_fail_block:
263 case action_fail_ignore:
264+ case action_fail_demote:
265 case action_fail_recover:
266 case action_fail_restart_container:
267 *on_fail = action_fail_ignore;
268@@ -3669,6 +3713,7 @@ unpack_rsc_op(pe_resource_t *rsc, pe_node_t *node, xmlNode *xml_op,
269 * that, ensure the remote connection is considered failed.
270 */
271 set_bit(node->details->remote_rsc->flags, pe_rsc_failed);
272+ set_bit(node->details->remote_rsc->flags, pe_rsc_stop);
273 }
274
275 // fall through
276diff --git a/lib/pengine/utils.c b/lib/pengine/utils.c
277index ad5a09b..e57d858 100644
278--- a/lib/pengine/utils.c
279+++ b/lib/pengine/utils.c
280@@ -670,6 +670,7 @@ static bool
281 valid_stop_on_fail(const char *value)
282 {
283 return safe_str_neq(value, "standby")
284+ && safe_str_neq(value, "demote")
285 && safe_str_neq(value, "stop");
286 }
287
288@@ -677,6 +678,11 @@ static const char *
289 unpack_operation_on_fail(action_t * action)
290 {
291
292+ const char *name = NULL;
293+ const char *role = NULL;
294+ const char *on_fail = NULL;
295+ const char *interval_spec = NULL;
296+ const char *enabled = NULL;
297 const char *value = g_hash_table_lookup(action->meta, XML_OP_ATTR_ON_FAIL);
298
299 if (safe_str_eq(action->task, CRMD_ACTION_STOP)
300@@ -685,14 +691,10 @@ unpack_operation_on_fail(action_t * action)
301 "action to default value because '%s' is not "
302 "allowed for stop", action->rsc->id, value);
303 return NULL;
304+
305 } else if (safe_str_eq(action->task, CRMD_ACTION_DEMOTE) && !value) {
306 /* demote on_fail defaults to master monitor value if present */
307 xmlNode *operation = NULL;
308- const char *name = NULL;
309- const char *role = NULL;
310- const char *on_fail = NULL;
311- const char *interval_spec = NULL;
312- const char *enabled = NULL;
313
314 CRM_CHECK(action->rsc != NULL, return NULL);
315
316@@ -715,10 +717,28 @@ unpack_operation_on_fail(action_t * action)
317 continue;
318 } else if (crm_parse_interval_spec(interval_spec) == 0) {
319 continue;
320+ } else if (safe_str_eq(on_fail, "demote")) {
321+ continue;
322 }
323
324 value = on_fail;
325 }
326+ } else if (safe_str_eq(value, "demote")) {
327+ name = crm_element_value(action->op_entry, "name");
328+ role = crm_element_value(action->op_entry, "role");
329+ on_fail = crm_element_value(action->op_entry, XML_OP_ATTR_ON_FAIL);
330+ interval_spec = crm_element_value(action->op_entry,
331+ XML_LRM_ATTR_INTERVAL);
332+
333+ if (safe_str_neq(name, CRMD_ACTION_PROMOTE)
334+ && (safe_str_neq(name, CRMD_ACTION_STATUS)
335+ || safe_str_neq(role, "Master")
336+ || (crm_parse_interval_spec(interval_spec) == 0))) {
337+ crm_config_err("Resetting '" XML_OP_ATTR_ON_FAIL "' for %s %s "
338+ "action to default value because 'demote' is not "
339+ "allowed for it", action->rsc->id, name);
340+ return NULL;
341+ }
342 }
343
344 return value;
345@@ -1097,6 +1117,10 @@ unpack_operation(action_t * action, xmlNode * xml_obj, resource_t * container,
346 value = NULL;
347 }
348
349+ } else if (safe_str_eq(value, "demote")) {
350+ action->on_fail = action_fail_demote;
351+ value = "demote instance";
352+
353 } else {
354 pe_err("Resource %s: Unknown failure type (%s)", action->rsc->id, value);
355 value = NULL;
diff --git a/debian/patches/ubuntu-2.0.3-demote/lp1896223-08-7eec572-Build-libcrmcommon-bump-CRM-feature-set.patch b/debian/patches/ubuntu-2.0.3-demote/lp1896223-08-7eec572-Build-libcrmcommon-bump-CRM-feature-set.patch
0new file mode 100644356new file mode 100644
index 0000000..95033ed
--- /dev/null
+++ b/debian/patches/ubuntu-2.0.3-demote/lp1896223-08-7eec572-Build-libcrmcommon-bump-CRM-feature-set.patch
@@ -0,0 +1,51 @@
1From: Ken Gaillot <kgaillot@redhat.com>
2Date: Fri, 5 Jun 2020 10:02:05 -0500
3Subject: Build: libcrmcommon: bump CRM feature set
4
5... for op_expression/rsc_expression rules, on-fail=demote, and
6no-quorum-policy=demote
7
8 [Backport]
9
10 The features op_expression/rsc_expression are not included in this
11 Ubuntu pacemaker release. The features being backported are only
12 on-fail/no-quorum-policy=demote. For this reason, instead of using the
13 upstream feature set version (3.4.0), I'm using Ubuntu own feature set
14 version 3.2.1 (using a minor-minor version).
15
16 Note: I have done the same thing in Pacemaker for Ubuntu Groovy, but,
17 because that pacemaker had feature set version 3.3.0, I used version
18 3.3.1.
19
20 There is no problem in having 3.2.1 supporting the feature, together
21 with 3.3.1, as the minor-minor version serves for that purpose: to
22 backport features in pacemaker distribution versions. When a cluster
23 is upgraded from Focal (3.2.1) to Groovy (3.3.1), the feature will
24 exist there as well.
25
26Signed-off-by: Rafael David Tinoco <rafaeldtinoco@ubuntu.com>
27
28Author: Ken Gaillot <kgaillot@redhat.com>
29Origin: backport, https://github.com/ClusterLabs/pacemaker/commit/7eec572
30Bug-Ubuntu: https://bugs.launchpad.net/bugs/1896223
31Reviewed-by: Rafael David Tinoco <rafaeldtinoco@ubuntu.com>
32Last-Update: 2020-10-05
33---
34 include/crm/crm.h | 3 ++-
35 1 file changed, 2 insertions(+), 1 deletion(-)
36
37diff --git a/include/crm/crm.h b/include/crm/crm.h
38index cbf72d3..35928bb 100644
39--- a/include/crm/crm.h
40+++ b/include/crm/crm.h
41@@ -50,8 +50,9 @@ extern "C" {
42 * XML v2 patchsets are created by default
43 * >=3.0.13: Fail counts include operation name and interval
44 * >=3.2.0: DC supports PCMK_LRM_OP_INVALID and PCMK_LRM_OP_NOT_CONNECTED
45+ * >=3.2.1: UBUNTU: on-fail=demote + no-quorum-policy=demote (3.4.0 backport)
46 */
47-# define CRM_FEATURE_SET "3.2.0"
48+# define CRM_FEATURE_SET "3.2.1"
49
50 # define EOS '\0'
51 # define DIMOF(a) ((int) (sizeof(a)/sizeof(a[0])) )
diff --git a/debian/patches/ubuntu-2.0.3-demote/lp1896223-09-204961e-Doc-Pacemaker-Explained-document-new-on-fail.patch b/debian/patches/ubuntu-2.0.3-demote/lp1896223-09-204961e-Doc-Pacemaker-Explained-document-new-on-fail.patch
0new file mode 10064452new file mode 100644
index 0000000..d993858
--- /dev/null
+++ b/debian/patches/ubuntu-2.0.3-demote/lp1896223-09-204961e-Doc-Pacemaker-Explained-document-new-on-fail.patch
@@ -0,0 +1,67 @@
1From: Ken Gaillot <kgaillot@redhat.com>
2Date: Tue, 26 May 2020 18:04:32 -0500
3Subject: Doc: Pacemaker Explained: document new on-fail="demote" option
4
5Author: Ken Gaillot <kgaillot@redhat.com>
6Origin: upstream, https://github.com/ClusterLabs/pacemaker/commit/204961e
7Bug-Ubuntu: https://bugs.launchpad.net/bugs/1896223
8Reviewed-by: Rafael David Tinoco <rafaeldtinoco@ubuntu.com>
9Last-Update: 2020-10-05
10---
11 doc/Pacemaker_Explained/en-US/Ch-Resources.txt | 36 ++++++++++++++++++++++++++
12 1 file changed, 36 insertions(+)
13
14diff --git a/doc/Pacemaker_Explained/en-US/Ch-Resources.txt b/doc/Pacemaker_Explained/en-US/Ch-Resources.txt
15index d8e7115..9df9243 100644
16--- a/doc/Pacemaker_Explained/en-US/Ch-Resources.txt
17+++ b/doc/Pacemaker_Explained/en-US/Ch-Resources.txt
18@@ -676,6 +676,10 @@ a|The action to take if this action ever fails. Allowed values:
19 * +ignore:+ Pretend the resource did not fail.
20 * +block:+ Don't perform any further operations on the resource.
21 * +stop:+ Stop the resource and do not start it elsewhere.
22+* +demote:+ Demote the resource, without a full restart. This is valid only for
23+ +promote+ actions, and for +monitor+ actions with both a nonzero +interval+
24+ and +role+ set to +Master+; for any other action, a configuration error will
25+ be logged, and the default behavior will be used.
26 * +restart:+ Stop the resource and start it again (possibly on a different node).
27 * +fence:+ STONITH the node on which the resource failed.
28 * +standby:+ Move _all_ resources away from the node on which the resource failed.
29@@ -714,6 +718,38 @@ indexterm:[Action,Property,on-fail]
30
31 |=========================================================
32
33+[NOTE]
34+====
35+When +on-fail+ is set to +demote+, recovery from failure by a successful demote
36+causes the cluster to recalculate whether and where a new instance should be
37+promoted. The node with the failure is eligible, so if master scores have not
38+changed, it will be promoted again.
39+
40+There is no direct equivalent of +migration-threshold+ for the master role, but
41+the same effect can be achieved with a location constraint using a
42+<<ch-rules,rule>> with a node attribute expression for the resource's fail
43+count.
44+
45+For example, to immediately ban the master role from a node with any failed
46+promote or master monitor:
47+[source,XML]
48+----
49+<rsc_location id="loc1" rsc="my_primitive">
50+ <rule id="rule1" score="-INFINITY" role="Master" boolean-op="or">
51+ <expression id="expr1" attribute="fail-count-my_primitive#promote_0"
52+ operation="gte" value="1"/>
53+ <expression id="expr2" attribute="fail-count-my_primitive#monitor_10000"
54+ operation="gte" value="1"/>
55+ </rule>
56+</rsc_location>
57+----
58+
59+This example assumes that there is a promotable clone of the +my_primitive+
60+resource (note that the primitive name, not the clone name, is used in the
61+rule), and that there is a recurring 10-second-interval monitor configured for
62+the master role (fail count attributes specify the interval in milliseconds).
63+====
64+
65 [[s-resource-monitoring]]
66 === Monitoring Resources for Failure ===
67
diff --git a/debian/patches/ubuntu-2.0.3-demote/lp1896223-10-015b5c0-Doc-Pacemaker-Explained-document-no-quorum.patch b/debian/patches/ubuntu-2.0.3-demote/lp1896223-10-015b5c0-Doc-Pacemaker-Explained-document-no-quorum.patch
0new file mode 10064468new file mode 100644
index 0000000..2573479
--- /dev/null
+++ b/debian/patches/ubuntu-2.0.3-demote/lp1896223-10-015b5c0-Doc-Pacemaker-Explained-document-no-quorum.patch
@@ -0,0 +1,26 @@
1From: Ken Gaillot <kgaillot@redhat.com>
2Date: Thu, 28 May 2020 12:13:20 -0500
3Subject: Doc: Pacemaker Explained: document no-quorum-policy=demote
4
5Author: Ken Gaillot <kgaillot@redhat.com>
6Origin: upstream, https://github.com/ClusterLabs/pacemaker/commit/015b5c0
7Bug-Ubuntu: https://bugs.launchpad.net/bugs/1896223
8Reviewed-by: Rafael David Tinoco <rafaeldtinoco@ubuntu.com>
9Last-Update: 2020-10-05
10---
11 doc/Pacemaker_Explained/en-US/Ch-Options.txt | 2 ++
12 1 file changed, 2 insertions(+)
13
14diff --git a/doc/Pacemaker_Explained/en-US/Ch-Options.txt b/doc/Pacemaker_Explained/en-US/Ch-Options.txt
15index f864987..d344ecd 100644
16--- a/doc/Pacemaker_Explained/en-US/Ch-Options.txt
17+++ b/doc/Pacemaker_Explained/en-US/Ch-Options.txt
18@@ -181,6 +181,8 @@ What to do when the cluster does not have quorum. Allowed values:
19 * +ignore:+ continue all resource management
20 * +freeze:+ continue resource management, but don't recover resources from nodes not in the affected partition
21 * +stop:+ stop all resources in the affected cluster partition
22+* +demote:+ demote promotable resources and stop all other resources in the
23+ affected cluster partition
24 * +suicide:+ fence all nodes in the affected cluster partition
25
26 | batch-limit | 0 |
diff --git a/debian/patches/ubuntu-2.0.3-demote/lp1896223-11-0b68344-Refactor-scheduler-functionize-checking-quorum.patch b/debian/patches/ubuntu-2.0.3-demote/lp1896223-11-0b68344-Refactor-scheduler-functionize-checking-quorum.patch
0new file mode 10064427new file mode 100644
index 0000000..ac2e23d
--- /dev/null
+++ b/debian/patches/ubuntu-2.0.3-demote/lp1896223-11-0b68344-Refactor-scheduler-functionize-checking-quorum.patch
@@ -0,0 +1,61 @@
1From: Ken Gaillot <kgaillot@redhat.com>
2Date: Tue, 2 Jun 2020 15:05:56 -0500
3Subject: Refactor: scheduler: functionize checking quorum policy in effect
4
5... for readability and ease of future changes
6
7Author: Ken Gaillot <kgaillot@redhat.com>
8Origin: upstream, https://github.com/ClusterLabs/pacemaker/commit/0b68344
9Bug-Ubuntu: https://bugs.launchpad.net/bugs/1896223
10Reviewed-by: Rafael David Tinoco <rafaeldtinoco@ubuntu.com>
11Last-Update: 2020-10-05
12---
13 lib/pengine/utils.c | 18 ++++++++++++++----
14 1 file changed, 14 insertions(+), 4 deletions(-)
15
16diff --git a/lib/pengine/utils.c b/lib/pengine/utils.c
17index e57d858..b842481 100644
18--- a/lib/pengine/utils.c
19+++ b/lib/pengine/utils.c
20@@ -451,6 +451,17 @@ sort_rsc_priority(gconstpointer a, gconstpointer b)
21 return 0;
22 }
23
24+static enum pe_quorum_policy
25+effective_quorum_policy(pe_resource_t *rsc, pe_working_set_t *data_set)
26+{
27+ enum pe_quorum_policy policy = data_set->no_quorum_policy;
28+
29+ if (is_set(data_set->flags, pe_flag_have_quorum)) {
30+ policy = no_quorum_ignore;
31+ }
32+ return policy;
33+}
34+
35 action_t *
36 custom_action(resource_t * rsc, char *key, const char *task,
37 node_t * on_node, gboolean optional, gboolean save_action,
38@@ -554,6 +565,7 @@ custom_action(resource_t * rsc, char *key, const char *task,
39
40 if (rsc != NULL) {
41 enum action_tasks a_task = text2task(action->task);
42+ enum pe_quorum_policy quorum_policy = effective_quorum_policy(rsc, data_set);
43 int warn_level = LOG_TRACE;
44
45 if (save_action) {
46@@ -625,13 +637,11 @@ custom_action(resource_t * rsc, char *key, const char *task,
47 crm_trace("Action %s requires only stonith", action->uuid);
48 action->runnable = TRUE;
49 #endif
50- } else if (is_set(data_set->flags, pe_flag_have_quorum) == FALSE
51- && data_set->no_quorum_policy == no_quorum_stop) {
52+ } else if (quorum_policy == no_quorum_stop) {
53 pe_action_set_flag_reason(__FUNCTION__, __LINE__, action, NULL, "no quorum", pe_action_runnable, TRUE);
54 crm_debug("%s\t%s (cancelled : quorum)", action->node->details->uname, action->uuid);
55
56- } else if (is_set(data_set->flags, pe_flag_have_quorum) == FALSE
57- && data_set->no_quorum_policy == no_quorum_freeze) {
58+ } else if (quorum_policy == no_quorum_freeze) {
59 pe_rsc_trace(rsc, "Check resource is already active: %s %s %s %s", rsc->id, action->uuid, role2text(rsc->next_role), role2text(rsc->role));
60 if (rsc->fns->active(rsc, TRUE) == FALSE || rsc->next_role > rsc->role) {
61 pe_action_set_flag_reason(__FUNCTION__, __LINE__, action, NULL, "quorum freeze", pe_action_runnable, TRUE);
diff --git a/debian/patches/ubuntu-2.0.3-demote/lp1896223-12-b1ae359-Feature-scheduler-support-demote-choice-for.patch b/debian/patches/ubuntu-2.0.3-demote/lp1896223-12-b1ae359-Feature-scheduler-support-demote-choice-for.patch
0new file mode 10064462new file mode 100644
index 0000000..c7fc280
--- /dev/null
+++ b/debian/patches/ubuntu-2.0.3-demote/lp1896223-12-b1ae359-Feature-scheduler-support-demote-choice-for.patch
@@ -0,0 +1,163 @@
1From: Ken Gaillot <kgaillot@redhat.com>
2Date: Tue, 2 Jun 2020 15:06:32 -0500
3Subject: Feature: scheduler: support "demote" choice for no-quorum-policy
4 option
5
6If quorum is lost, promotable resources in the master role will be demoted but
7left running, and all other resources will be stopped.
8
9 [Backport]
10
11 Existing pacemaker version did not have create common/options.c yet,
12 and did not have a detailed pengine/pe_output.c txt output function.
13 I have changed html and xml output functions (as the original patch)
14 and kept the same changes in everything else.
15
16Signed-off-by: Rafael David Tinoco <rafaeldtinoco@ubuntu.com>
17
18Author: Ken Gaillot <kgaillot@redhat.com>
19Origin: backport, https://github.com/ClusterLabs/pacemaker/commit/b1ae359
20Bug-Ubuntu: https://bugs.launchpad.net/bugs/1896223
21Reviewed-by: Rafael David Tinoco <rafaeldtinoco@ubuntu.com>
22Last-Update: 2020-10-05
23---
24 daemons/controld/controld_control.c | 2 +-
25 include/crm/pengine/pe_types.h | 3 ++-
26 lib/common/utils.c | 3 +++
27 lib/pengine/common.c | 2 +-
28 lib/pengine/unpack.c | 7 +++++++
29 lib/pengine/utils.c | 14 ++++++++++++++
30 tools/crm_mon_output.c | 9 +++++++++
31 7 files changed, 37 insertions(+), 3 deletions(-)
32
33diff --git a/daemons/controld/controld_control.c b/daemons/controld/controld_control.c
34index 6c7f97c..132b059 100644
35--- a/daemons/controld/controld_control.c
36+++ b/daemons/controld/controld_control.c
37@@ -587,7 +587,7 @@ static pe_cluster_option crmd_opts[] = {
38 { "stonith-max-attempts",NULL,"integer",NULL,"10",&check_positive_number,
39 "How many times stonith can fail before it will no longer be attempted on a target"
40 },
41- { "no-quorum-policy", NULL, "enum", "stop, freeze, ignore, suicide", "stop", &check_quorum, NULL, NULL },
42+ { "no-quorum-policy", NULL, "enum", "stop, freeze, ignore, demote, suicide", "stop", &check_quorum, NULL, NULL },
43 };
44 /* *INDENT-ON* */
45
46diff --git a/include/crm/pengine/pe_types.h b/include/crm/pengine/pe_types.h
47index 6e5cbcc..baa9160 100644
48--- a/include/crm/pengine/pe_types.h
49+++ b/include/crm/pengine/pe_types.h
50@@ -61,7 +61,8 @@ enum pe_quorum_policy {
51 no_quorum_freeze,
52 no_quorum_stop,
53 no_quorum_ignore,
54- no_quorum_suicide
55+ no_quorum_suicide,
56+ no_quorum_demote
57 };
58
59 enum node_type {
60diff --git a/lib/common/utils.c b/lib/common/utils.c
61index cb0bc1f..b114b44 100644
62--- a/lib/common/utils.c
63+++ b/lib/common/utils.c
64@@ -140,6 +140,9 @@ check_quorum(const char *value)
65 } else if (safe_str_eq(value, "ignore")) {
66 return TRUE;
67
68+ } else if (safe_str_eq(value, "demote")) {
69+ return TRUE;
70+
71 } else if (safe_str_eq(value, "suicide")) {
72 return TRUE;
73 }
74diff --git a/lib/pengine/common.c b/lib/pengine/common.c
75index fcd7cf0..d134d79 100644
76--- a/lib/pengine/common.c
77+++ b/lib/pengine/common.c
78@@ -75,7 +75,7 @@ check_placement_strategy(const char *value)
79 /* *INDENT-OFF* */
80 static pe_cluster_option pe_opts[] = {
81 /* name, old-name, validate, default, description */
82- { "no-quorum-policy", NULL, "enum", "stop, freeze, ignore, suicide", "stop", &check_quorum,
83+ { "no-quorum-policy", NULL, "enum", "stop, freeze, ignore, demote, suicide", "stop", &check_quorum,
84 "What to do when the cluster does not have quorum", NULL },
85 { "symmetric-cluster", NULL, "boolean", NULL, "true", &check_boolean,
86 "All resources can run anywhere by default", NULL },
87diff --git a/lib/pengine/unpack.c b/lib/pengine/unpack.c
88index e690c4e..3306662 100644
89--- a/lib/pengine/unpack.c
90+++ b/lib/pengine/unpack.c
91@@ -243,6 +243,9 @@ unpack_config(xmlNode * config, pe_working_set_t * data_set)
92 } else if (safe_str_eq(value, "freeze")) {
93 data_set->no_quorum_policy = no_quorum_freeze;
94
95+ } else if (safe_str_eq(value, "demote")) {
96+ data_set->no_quorum_policy = no_quorum_demote;
97+
98 } else if (safe_str_eq(value, "suicide")) {
99 if (is_set(data_set->flags, pe_flag_stonith_enabled)) {
100 int do_panic = 0;
101@@ -271,6 +274,10 @@ unpack_config(xmlNode * config, pe_working_set_t * data_set)
102 case no_quorum_stop:
103 crm_debug("On loss of quorum: Stop ALL resources");
104 break;
105+ case no_quorum_demote:
106+ crm_debug("On loss of quorum: "
107+ "Demote promotable resources and stop other resources");
108+ break;
109 case no_quorum_suicide:
110 crm_notice("On loss of quorum: Fence all remaining nodes");
111 break;
112diff --git a/lib/pengine/utils.c b/lib/pengine/utils.c
113index b842481..fd06e9e 100644
114--- a/lib/pengine/utils.c
115+++ b/lib/pengine/utils.c
116@@ -458,6 +458,20 @@ effective_quorum_policy(pe_resource_t *rsc, pe_working_set_t *data_set)
117
118 if (is_set(data_set->flags, pe_flag_have_quorum)) {
119 policy = no_quorum_ignore;
120+
121+ } else if (data_set->no_quorum_policy == no_quorum_demote) {
122+ switch (rsc->role) {
123+ case RSC_ROLE_MASTER:
124+ case RSC_ROLE_SLAVE:
125+ if (rsc->next_role > RSC_ROLE_SLAVE) {
126+ rsc->next_role = RSC_ROLE_SLAVE;
127+ }
128+ policy = no_quorum_ignore;
129+ break;
130+ default:
131+ policy = no_quorum_stop;
132+ break;
133+ }
134 }
135 return policy;
136 }
137diff --git a/tools/crm_mon_output.c b/tools/crm_mon_output.c
138index c27aa83..bb6b8b8 100644
139--- a/tools/crm_mon_output.c
140+++ b/tools/crm_mon_output.c
141@@ -472,6 +472,11 @@ cluster_options_html(pcmk__output_t *out, va_list args) {
142 out->list_item(out, NULL, "No Quorum policy: Stop ALL resources");
143 break;
144
145+ case no_quorum_demote:
146+ out->list_item(out, NULL, "No Quorum policy: Demote promotable "
147+ "resources and stop all other resources");
148+ break;
149+
150 case no_quorum_ignore:
151 out->list_item(out, NULL, "No Quorum policy: Ignore");
152 break;
153@@ -526,6 +531,10 @@ cluster_options_xml(pcmk__output_t *out, va_list args) {
154 xmlSetProp(node, (pcmkXmlStr) "no-quorum-policy", (pcmkXmlStr) "stop");
155 break;
156
157+ case no_quorum_demote:
158+ xmlSetProp(node, (pcmkXmlStr) "no-quorum-policy", (pcmkXmlStr) "demote");
159+ break;
160+
161 case no_quorum_ignore:
162 xmlSetProp(node, (pcmkXmlStr) "no-quorum-policy", (pcmkXmlStr) "ignore");
163 break;
diff --git a/debian/patches/ubuntu-2.0.3-demote/lp1896223-13-d4b9117-Doc-Pacemaker-Explained-correct-on-fail-default.patch b/debian/patches/ubuntu-2.0.3-demote/lp1896223-13-d4b9117-Doc-Pacemaker-Explained-correct-on-fail-default.patch
0new file mode 100644164new file mode 100644
index 0000000..f70ac52
--- /dev/null
+++ b/debian/patches/ubuntu-2.0.3-demote/lp1896223-13-d4b9117-Doc-Pacemaker-Explained-correct-on-fail-default.patch
@@ -0,0 +1,33 @@
1From: Ken Gaillot <kgaillot@redhat.com>
2Date: Tue, 26 May 2020 18:10:33 -0500
3Subject: Doc: Pacemaker Explained: correct on-fail default
4
5Author: Ken Gaillot <kgaillot@redhat.com>
6Origin: upstream, https://github.com/ClusterLabs/pacemaker/commit/d4b9117
7Bug-Ubuntu: https://bugs.launchpad.net/bugs/1896223
8Reviewed-by: Rafael David Tinoco <rafaeldtinoco@ubuntu.com>
9Last-Update: 2020-10-05
10---
11 doc/Pacemaker_Explained/en-US/Ch-Resources.txt | 9 +++++++--
12 1 file changed, 7 insertions(+), 2 deletions(-)
13
14diff --git a/doc/Pacemaker_Explained/en-US/Ch-Resources.txt b/doc/Pacemaker_Explained/en-US/Ch-Resources.txt
15index 9df9243..88892db 100644
16--- a/doc/Pacemaker_Explained/en-US/Ch-Resources.txt
17+++ b/doc/Pacemaker_Explained/en-US/Ch-Resources.txt
18@@ -669,8 +669,13 @@ XML attributes take precedence over +nvpair+ elements if both are specified.
19 indexterm:[Action,Property,timeout]
20
21 |on-fail
22-|restart '(except for +stop+ operations, which default to' fence 'when
23- STONITH is enabled and' block 'otherwise)'
24+a|Varies by action:
25+
26+* +stop+: +fence+ if +stonith-enabled+ is true or +block+ otherwise
27+* +demote+: +on-fail+ of the +monitor+ action with +role+ set to +Master+, if
28+ present, enabled, and configured to a value other than +demote+, or +restart+
29+ otherwise
30+* all other actions: +restart+
31 a|The action to take if this action ever fails. Allowed values:
32
33 * +ignore:+ Pretend the resource did not fail.

Subscribers

People subscribed via source and target branches