Merge ~rafaeldtinoco/ubuntu/+source/resource-agents:lp1895348-groovy into ubuntu/+source/resource-agents:ubuntu/groovy-devel

Proposed by Rafael David Tinoco
Status: Merged
Approved by: Rafael David Tinoco
Approved revision: 8cd36c545a9bbf98a2a71b7c11fd62d77642fe21
Merged at revision: 8cd36c545a9bbf98a2a71b7c11fd62d77642fe21
Proposed branch: ~rafaeldtinoco/ubuntu/+source/resource-agents:lp1895348-groovy
Merge into: ubuntu/+source/resource-agents:ubuntu/groovy-devel
Diff against target: 185 lines (+151/-0)
5 files modified
debian/changelog (+11/-0)
debian/patches/series (+4/-0)
debian/patches/ubuntu/lp1895348-azure-lb-Don-t-redirect-nc-listener.patch (+57/-0)
debian/patches/ubuntu/lp1895348-build-fix-distcheck-issue.patch (+17/-0)
debian/patches/ubuntu/lp1895348-galera-Fix-automatic-recovery.patch (+62/-0)
Reviewer Review Type Date Requested Status
Rafael David Tinoco (community) Approve
Bryce Harrington (community) Approve
Canonical Server Core Reviewers Pending
Review via email: mp+390636@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Bryce Harrington (bryce) wrote :

  - [d22700fc] azure-lb: Don't redirect nc listener output to pidfile

LGTM +1, this looks like a cleaner approach for capturing the pid, than from the eval string.

  - [3b0ffc59] build: fix distcheck issue introduced by including
               READMEs in heartbeat directory

LGTM +1, trivial

  - [73551ac0] galera: Fix automatic recovery when a cluster was not
               gracefully stopped

LGTM +1, I had to think the refactoring logic through but it does make sense for the described behavior. I did not reproduce the problem myself, but can see now how it would occur.

review: Approve
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Thanks for reviewing this @bryce!! That was quick!

$ git ubuntu tag --upload

$ git describe
upload/1%4.6.1-1ubuntu2

$ git push pkg upload/1%4.6.1-1ubuntu2
Counting objects: 14, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (14/14), done.
Writing objects: 100% (14/14), 4.31 KiB | 441.00 KiB/s, done.
Total 14 (delta 6), reused 0 (delta 0)
To ssh://git.launchpad.net/ubuntu/+source/resource-agents
 * [new tag] upload/1%4.6.1-1ubuntu2 -> upload/1%4.6.1-1ubuntu2

$ dput ubuntu resource-agents_4.6.1-1ubuntu2_source.changes
Checking signature on .changes
gpg: /home/rafaeldtinoco/work/sources/ubuntu/resource-agents_4.6.1-1ubuntu2_source.changes: Valid signature from A93E0E0AD83C0D0F
Checking signature on .dsc
gpg: /home/rafaeldtinoco/work/sources/ubuntu/resource-agents_4.6.1-1ubuntu2.dsc: Valid signature from A93E0E0AD83C0D0F
Uploading to ubuntu (via ftp to upload.ubuntu.com):
  Uploading resource-agents_4.6.1-1ubuntu2.dsc: done.
  Uploading resource-agents_4.6.1-1ubuntu2.debian.tar.xz: done.
  Uploading resource-agents_4.6.1-1ubuntu2_source.buildinfo: done.
  Uploading resource-agents_4.6.1-1ubuntu2_source.changes: done.
Successfully uploaded packages.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/debian/changelog b/debian/changelog
2index 440d124..21a29d6 100644
3--- a/debian/changelog
4+++ b/debian/changelog
5@@ -1,3 +1,14 @@
6+resource-agents (1:4.6.1-1ubuntu2) groovy; urgency=medium
7+
8+ * d/p/u/lp1895348-*: Post Release Fixes (LP: #1895348):
9+ - [d22700fc] azure-lb: Don't redirect nc listener output to pidfile
10+ - [3b0ffc59] build: fix distcheck issue introduced by including
11+ READMEs in heartbeat directory
12+ - [73551ac0] galera: Fix automatic recovery when a cluster was not
13+ gracefully stopped
14+
15+ -- Rafael David Tinoco <rafaeldtinoco@ubuntu.com> Fri, 11 Sep 2020 19:30:14 +0000
16+
17 resource-agents (1:4.6.1-1ubuntu1) groovy; urgency=medium
18
19 * Merge with Debian unstable (LP: #1886603). Remaining changes:
20diff --git a/debian/patches/series b/debian/patches/series
21index c54dc17..3b9ba6c 100644
22--- a/debian/patches/series
23+++ b/debian/patches/series
24@@ -7,3 +7,7 @@ gitignore.patch
25 reproducible.patch
26 var-run.patch
27 bashism.patch
28+# ubuntu: after release fixes
29+ubuntu/lp1895348-azure-lb-Don-t-redirect-nc-listener.patch
30+ubuntu/lp1895348-galera-Fix-automatic-recovery.patch
31+ubuntu/lp1895348-build-fix-distcheck-issue.patch
32diff --git a/debian/patches/ubuntu/lp1895348-azure-lb-Don-t-redirect-nc-listener.patch b/debian/patches/ubuntu/lp1895348-azure-lb-Don-t-redirect-nc-listener.patch
33new file mode 100644
34index 0000000..1e2ff23
35--- /dev/null
36+++ b/debian/patches/ubuntu/lp1895348-azure-lb-Don-t-redirect-nc-listener.patch
37@@ -0,0 +1,57 @@
38+From d22700fc5d5098c683b465ea0fede43803fd4d6b Mon Sep 17 00:00:00 2001
39+From: Reid wahl <nrwahl@protonmail.com>
40+Date: Tue, 7 Jul 2020 02:18:09 -0700
41+Subject: [PATCH] azure-lb: Don't redirect nc listener output to pidfile
42+
43+The `lb_start()` function spawns an `nc` listener background process
44+and echoes the resulting pid to `$pidfile`. Due to a bug in the
45+redirection, all future data received by the `nc` process is also
46+appended to `$pidfile`.
47+
48+If binary data is received later and appended to `$pidfile`, the
49+monitor operation fails when `grep` searches the now-binary file.
50+
51+```
52+line 97: kill: Binary: arguments must be process or job IDs ]
53+line 97: kill: file: arguments must be process or job IDs ]
54+line 97: kill: /var/run/nc_PF2_02.pid: arguments must be process or job
55+ IDs ]
56+line 97: kill: matches: arguments must be process or job IDs ]
57+```
58+
59+Then the start operation fails during recovery. `lb_start()` spawns a
60+new `nc` process, but the old process is still running and using the
61+configured port.
62+
63+```
64+nc_PF2_02_start_0:777:stderr [ Ncat: bind to :::62502: Address
65+ already in use. QUITTING. ]
66+```
67+
68+This patch fixes the issue by removing the `nc &` command from the
69+section whose output gets redirected to `$pidfile`. Now, only the `nc`
70+PID is echoed to `$pidfile`.
71+
72+Resolves: RHBZ#1850778
73+Resolves: RHBZ#1850779
74+---
75+ heartbeat/azure-lb | 3 ++-
76+ 1 file changed, 2 insertions(+), 1 deletion(-)
77+
78+diff --git a/heartbeat/azure-lb b/heartbeat/azure-lb
79+index 05c13451..05755d77 100755
80+--- a/heartbeat/azure-lb
81++++ b/heartbeat/azure-lb
82+@@ -113,7 +113,8 @@ lb_start() {
83+ if ! lb_monitor; then
84+ ocf_log debug "Starting $process: $cmd"
85+ # Execute the command as created above
86+- eval "$cmd & echo \$!" > $pidfile
87++ $cmd &
88++ echo $! > $pidfile
89+ if lb_monitor; then
90+ ocf_log debug "$process: $cmd started successfully, calling monitor"
91+ lb_monitor
92+--
93+2.27.0
94+
95diff --git a/debian/patches/ubuntu/lp1895348-build-fix-distcheck-issue.patch b/debian/patches/ubuntu/lp1895348-build-fix-distcheck-issue.patch
96new file mode 100644
97index 0000000..992f940
98--- /dev/null
99+++ b/debian/patches/ubuntu/lp1895348-build-fix-distcheck-issue.patch
100@@ -0,0 +1,17 @@
101+From 3b0ffc598ea75d8dad2c7efe52c73437e5eadb17 Mon Sep 17 00:00:00 2001
102+From: Oyvind Albrigtsen <oalbrigt@redhat.com>
103+Date: Tue, 25 Aug 2020 13:03:09 +0200
104+Subject: [PATCH] build: fix distcheck issue introduced by including READMEs in
105+ heartbeat directory
106+
107+--- resource-agents-4.6.1.orig/heartbeat/Makefile.am
108++++ resource-agents-4.6.1/heartbeat/Makefile.am
109+@@ -21,7 +21,7 @@ MAINTAINERCLEANFILES = Makefile.in
110+
111+ EXTRA_DIST = $(ocf_SCRIPTS) $(ocfcommon_DATA) \
112+ $(common_DATA) $(hb_DATA) $(dtd_DATA) \
113+- README
114++ README README.galera
115+
116+ AM_CPPFLAGS = -I$(top_srcdir)/include -I$(top_srcdir)/linux-ha
117+
118diff --git a/debian/patches/ubuntu/lp1895348-galera-Fix-automatic-recovery.patch b/debian/patches/ubuntu/lp1895348-galera-Fix-automatic-recovery.patch
119new file mode 100644
120index 0000000..fb55f06
121--- /dev/null
122+++ b/debian/patches/ubuntu/lp1895348-galera-Fix-automatic-recovery.patch
123@@ -0,0 +1,62 @@
124+From 73551ac029c7480cf710c392ad2d500ef2a7b07a Mon Sep 17 00:00:00 2001
125+From: Petr Pavlu <petr.pavlu@suse.com>
126+Date: Wed, 26 Aug 2020 13:59:16 +0200
127+Subject: [PATCH] galera: Fix automatic recovery when a cluster was not
128+ gracefully stopped
129+
130+When selecting a bootstrap node, the Galera resource agent primarily
131+depends on the safe_to_bootstrap flag in grastate.dat. If none of the
132+nodes have this flag set to 1 then functions detect_last_commit() +
133+detect_first_master() provide a recovery logic to select the bootstrap
134+node based on each node's last commit, as obtained from grastate.dat or
135+'mysqld_safe --wsrep-recover'.
136+
137+Fix 65f35e9172407e64ded90f29ea8fc0dfca9643e3 introduced a problem for
138+this recovery logic. If a whole cluster is not gracefully stopped,
139+grastate.dat on every node contains "safe_to_bootstrap: 0" and
140+"seqno: -1". Function detect_safe_to_bootstrap() then considers each
141+node with this seqno as not suitable for bootstraping and clears the
142+safe_to_bootstrap attribute. Nonetheless, functions detect_last_commit()
143++ detect_first_master() successfully find a bootstrap node, relying on
144+the recovery logic. However, when the promote operation is invoked,
145+function galera_promote() runs check '"$(get_safe_to_bootstrap)" = "0"'
146+which fails and prevents the code from writing "safe_to_bootstrap: 1"
147+into grastate.dat of the selected node to mark it as a bootstrap node.
148+The end result is that Galera refuses to be started on this node and
149+therefore the whole cluster remains down.
150+
151+The patch fixes the problem by adjusting detect_safe_to_bootstrap() to
152+accept the combination of "safe_to_bootstrap: 0" and "seqno: -1" and
153+allow a node with this state to potentially become a bootstrap node.
154+---
155+ heartbeat/galera | 11 ++++++++---
156+ 1 file changed, 8 insertions(+), 3 deletions(-)
157+
158+diff --git a/heartbeat/galera b/heartbeat/galera
159+index 4a313e24..74f11d8c 100755
160+--- a/heartbeat/galera
161++++ b/heartbeat/galera
162+@@ -604,12 +604,17 @@ detect_safe_to_bootstrap()
163+ seqno=$(sed -n 's/^seqno:\s*\(.*\)$/\1/p' < ${OCF_RESKEY_datadir}/grastate.dat)
164+ fi
165+
166+- if [ -z "$uuid" ] || [ -z "$seqno" ] || \
167+- [ "$uuid" = "00000000-0000-0000-0000-000000000000" ] || \
168+- [ "$seqno" = "-1" ]; then
169++ if [ -z "$uuid" ] || \
170++ [ "$uuid" = "00000000-0000-0000-0000-000000000000" ]; then
171+ clear_safe_to_bootstrap
172+ return
173+ fi
174++ if [ "$safe_to_bootstrap" = "1" ]; then
175++ if [ -z "$seqno" ] || [ "$seqno" = "-1" ]; then
176++ clear_safe_to_bootstrap
177++ return
178++ fi
179++ fi
180+
181+ if [ "$safe_to_bootstrap" = "1" ] || [ "$safe_to_bootstrap" = "0" ]; then
182+ set_safe_to_bootstrap $safe_to_bootstrap
183+--
184+2.27.0
185+

Subscribers

People subscribed via source and target branches