Corosync: Assertion 'sender_node != NULL' failed when bind iface is ready after corosync boots

Bug #1739033 reported by Victor Tapia
18
This bug affects 1 person
Affects Status Importance Assigned to Milestone
corosync (Ubuntu)
Fix Released
Undecided
Unassigned
Trusty
Fix Released
Medium
Victor Tapia
Xenial
Fix Released
Medium
Victor Tapia
Zesty
Fix Released
Undecided
Unassigned
Artful
Fix Released
Undecided
Unassigned

Bug Description

[Impact]

Corosync sigaborts if it starts before the interface it has to bind to is ready.

On boot, if no interface in the bindnetaddr range is up/configured, corosync binds to lo (127.0.0.1). Once an applicable interface is up, corosync crashes with the following error message:

corosync: votequorum.c:2019: message_handler_req_exec_votequorum_nodeinfo: Assertion `sender_node != NULL' failed.
Aborted (core dumped)

The last log entries show that the interface is trying to join the cluster:

Dec 19 11:36:05 [22167] xenial-pacemaker corosync debug [TOTEM ] totemsrp.c:2089 entering OPERATIONAL state.
Dec 19 11:36:05 [22167] xenial-pacemaker corosync notice [TOTEM ] totemsrp.c:2095 A new membership (169.254.241.10:444) was formed. Members joined: 704573706

During the quorum calculation, the generated nodeid (704573706) for the node is being used instead of the nodeid specified in the configuration file (1), and the assert fails because the nodeid is not present in the member list. Corosync should use the correct nodeid and continue running after the interface is up, as shown in a fixed corosync boot:

Dec 19 11:50:56 [4824] xenial-corosync corosync notice [TOTEM ] totemsrp.c:2095 A new membership (169.254.241.10:80) was formed. Members joined: 1

[Environment]

Xenial 16.04.3

Packages:

ii corosync 2.3.5-3ubuntu1 amd64 cluster engine daemon and utilities
ii libcorosync-common4:amd64 2.3.5-3ubuntu1 amd64 cluster engine common library

[Test Case]

Config:

totem {
        version: 2
        member {
                memberaddr: 169.254.241.10
        }
        member {
                memberaddr: 169.254.241.20
        }
        transport: udpu

        crypto_cipher: none
        crypto_hash: none
        nodeid: 1
        interface {
                ringnumber: 0
                bindnetaddr: 169.254.241.0
                mcastport: 5405
                ttl: 1
        }
}

quorum {
        provider: corosync_votequorum
        expected_votes: 2
}

nodelist {
        node {
                ring0_addr: 169.254.241.10
                nodeid: 1
        }
        node {
                ring0_addr: 169.254.241.20
                nodeid: 2
        }
}

1. ifdown interface (169.254.241.10)
2. start corosync (/usr/sbin/corosync -f)
3. ifup interface

[Regression Potential]

This patch affects corosync boot; the regression potential is for other problems during corosync startup and/or configuration parsing.

[Other info]

# Upstream corosync commit :
https://github.com/corosync/corosync/commit/aab55a004bb12ebe78db341dc56759dfe710c1b2

# git describe aab55a004bb12ebe78db341dc56759dfe710c1b2
v2.3.5-45-gaab55a0

# rmadison corosync
corosync | 2.3.3-1ubuntu1 | trusty | source, amd64, arm64, armhf, i386, powerpc, ppc64el
corosync | 2.3.3-1ubuntu3 | trusty-updates | source, amd64, arm64, armhf, i386, powerpc, ppc64el
corosync | 2.3.5-3ubuntu1 | xenial | source, amd64, arm64, armhf, i386, powerpc, ppc64el, s390x
corosync | 2.4.2-3build1 | zesty | source, amd64, arm64, armhf, i386, ppc64el, s390x
corosync | 2.4.2-3build1 | artful | source, amd64, arm64, armhf, i386, ppc64el, s390x
corosync | 2.4.2-3build1 | bionic | source, amd64, arm64, armhf, i386, ppc64el, s390x

Victor Tapia (vtapia)
description: updated
Victor Tapia (vtapia)
description: updated
Revision history for this message
Victor Tapia (vtapia) wrote :

I just noticed that the member{} group should be inside interface{}, triggering the bug described by this commit: https://github.com/corosync/corosync/commit/aab55a004bb12ebe78db341dc56759dfe710c1b2

description: updated
Eric Desrochers (slashd)
tags: added: sts sts-sponsor-ddstreet
Eric Desrochers (slashd)
Changed in corosync (Ubuntu):
status: New → Fix Released
Changed in corosync (Ubuntu Trusty):
assignee: nobody → Victor Tapia (vtapia)
Changed in corosync (Ubuntu Xenial):
assignee: nobody → Victor Tapia (vtapia)
status: New → In Progress
Changed in corosync (Ubuntu Trusty):
status: New → In Progress
importance: Undecided → Medium
Changed in corosync (Ubuntu Xenial):
importance: Undecided → Medium
Revision history for this message
Victor Tapia (vtapia) wrote :
Revision history for this message
Victor Tapia (vtapia) wrote :
Revision history for this message
Dan Streetman (ddstreet) wrote :

sponsored to trusty and xenial

Dan Streetman (ddstreet)
description: updated
Eric Desrochers (slashd)
tags: added: sts-sponsor-ddstreet-done
removed: sts-sponsor-ddstreet
Revision history for this message
Brian Murray (brian-murray) wrote :

Is this fixed in the the artful and bionic versions of corosync in the archive?

Changed in corosync (Ubuntu Xenial):
status: In Progress → Incomplete
Revision history for this message
Victor Tapia (vtapia) wrote :

Yes, the fix is already included in zesty+ (2.4.0+)

Revision history for this message
Eric Desrochers (slashd) wrote :

To re-enforce Victor's answer (comment #6) and help answering Brian's question (comment #5) :

## corosync upstream commit :
# git describe aab55a004bb12ebe78db341dc56759dfe710c1b2
v2.3.5-45-gaab55a0

# rmadison corosync
 corosync | 2.4.2-3build1 | zesty | source, amd64, arm64, armhf, i386, ppc64el, s390x
 corosync | 2.4.2-3build1 | artful | source, amd64, arm64, armhf, i386, ppc64el, s390x
 corosync | 2.4.2-3build1 | bionic | source, amd64, arm64, armhf, i386, ppc64el, s390x

- Eric

Dan Streetman (ddstreet)
Changed in corosync (Ubuntu Xenial):
status: Incomplete → In Progress
Eric Desrochers (slashd)
Changed in corosync (Ubuntu Zesty):
status: New → Fix Released
Changed in corosync (Ubuntu Artful):
status: New → Fix Released
description: updated
description: updated
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Victor, or anyone else affected,

Accepted corosync into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/corosync/2.3.5-3ubuntu2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in corosync (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-xenial
Changed in corosync (Ubuntu Trusty):
status: In Progress → Fix Committed
tags: added: verification-needed-trusty
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Victor, or anyone else affected,

Accepted corosync into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/corosync/2.3.3-1ubuntu4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-trusty to verification-done-trusty. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-trusty. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Revision history for this message
Victor Tapia (vtapia) wrote :
Download full text (4.7 KiB)

#VERIFICATION FOR XENIAL

- Packages
ii corosync 2.3.5-3ubuntu2 amd64 cluster engine daemon and utilities
ii libcorosync-common4:amd64 2.3.5-3ubuntu2 amd64 cluster engine common library

- Reproducer
Using a config file with bad entries (as shown in the description)
ifdown interface
/usr/sbin/corosync -f
ifup interface

- Debug output:

Dec 22 11:22:01 xenial-corosync corosync[2742]: [TOTEM ] totemudpu.c:408 sendmsg(mcast) failed (non-critical): Invalid argument (22)
Dec 22 11:22:02 xenial-corosync corosync[2742]: message repeated 14 times: [ [TOTEM ] totemudpu.c:408 sendmsg(mcast) failed (non-critical): Invalid argument (22)]
Dec 22 11:22:02 xenial-corosync corosync[2742]: [TOTEM ] totemudpu.c:619 The network interface [169.254.241.20] is now up.
Dec 22 11:22:02 xenial-corosync corosync[2742]: [TOTEM ] totemudpu.c:1125 adding new UDPU member {169.254.241.10}
Dec 22 11:22:02 xenial-corosync corosync[2742]: [TOTEM ] totemudpu.c:1125 adding new UDPU member {169.254.241.20}
Dec 22 11:22:02 xenial-corosync corosync[2742]: [TOTEM ] totemsrp.c:2175 entering GATHER state from 15(interface change).
Dec 22 11:22:05 xenial-corosync corosync[2742]: [TOTEM ] totemsrp.c:2175 entering GATHER state from 0(consensus timeout).
Dec 22 11:22:05 xenial-corosync corosync[2742]: [TOTEM ] totemsrp.c:3227 Creating commit token because I am the rep.
Dec 22 11:22:05 xenial-corosync corosync[2742]: [TOTEM ] totemsrp.c:1591 Saving state aru 0 high seq received 0
Dec 22 11:22:06 xenial-corosync corosync[2742]: [TOTEM ] totemsrp.c:2224 entering COMMIT state.
Dec 22 11:22:06 xenial-corosync corosync[2742]: [TOTEM ] totemsrp.c:4571 got commit token
Dec 22 11:22:06 xenial-corosync corosync[2742]: [TOTEM ] totemsrp.c:2261 entering RECOVERY state.
Dec 22 11:22:06 xenial-corosync corosync[2742]: [TOTEM ] totemsrp.c:2307 position [0] member 169.254.241.20:
Dec 22 11:22:06 xenial-corosync corosync[2742]: [TOTEM ] totemsrp.c:2311 previous ring seq 4c rep 127.0.0.1
Dec 22 11:22:06 xenial-corosync corosync[2742]: [TOTEM ] totemsrp.c:2317 aru 0 high delivered 0 received flag 1
Dec 22 11:22:06 xenial-corosync corosync[2742]: [TOTEM ] totemsrp.c:2415 Did not need to originate any messages in recovery.
Dec 22 11:22:06 xenial-corosync corosync[2742]: [TOTEM ] totemsrp.c:4571 got commit token
Dec 22 11:22:06 xenial-corosync corosync[2742]: [TOTEM ] totemsrp.c:4632 Sending initial ORF token
Dec 22 11:22:06 xenial-corosync corosync[2742]: [TOTEM ] totemsrp.c:3828 token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 0, aru 0
Dec 22 11:22:06 xenial-corosync corosync[2742]: [TOTEM ] totemsrp.c:3839 install seq 0 aru 0 high seq received 0
Dec 22 11:22:06 xenial-corosync corosync[2742]: [TOTEM ] totemsrp.c:3828 token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 1, aru 0
Dec 22 11:22:06 xenial-corosync corosync[2742]: [TOTEM ] totemsrp.c:3839 install seq 0 aru 0 high seq received 0
Dec 22 11:22:06 xenial-corosync corosync[2742]: [TOTEM ] totemsrp.c:3828 token retrans flag is 0 my set retrans flag0 retrans que...

Read more...

Revision history for this message
Victor Tapia (vtapia) wrote :
Download full text (3.8 KiB)

#VERIFICATION FOR TRUSTY

- Packages
ii corosync 2.3.3-1ubuntu4 amd64 Standards-based cluster framework (daemon and modules)
ii libcorosync-common4 2.3.3-1ubuntu4 amd64 Standards-based cluster framework, common library

- Reproducer
Using a config file with bad entries (as shown in the description)
ifdown interface
/usr/sbin/corosync -f
ifup interface

- Debug output:

Dec 22 12:18:27 trusty-corosync corosync[3910]: [TOTEM ] entering GATHER state from 0(consensus timeout).
Dec 22 12:18:27 trusty-corosync corosync[3910]: [TOTEM ] Creating commit token because I am the rep.
Dec 22 12:18:27 trusty-corosync corosync[3910]: [TOTEM ] Saving state aru 0 high seq received 0
Dec 22 12:18:27 trusty-corosync corosync[3910]: [TOTEM ] Storing new sequence id for ring 4
Dec 22 12:18:27 trusty-corosync corosync[3910]: [TOTEM ] entering COMMIT state.
Dec 22 12:18:27 trusty-corosync corosync[3910]: [TOTEM ] got commit token
Dec 22 12:18:27 trusty-corosync corosync[3910]: [TOTEM ] entering RECOVERY state.
Dec 22 12:18:27 trusty-corosync corosync[3910]: [TOTEM ] position [0] member 169.254.241.20:
Dec 22 12:18:27 trusty-corosync corosync[3910]: [TOTEM ] previous ring seq 0 rep 127.0.0.1
Dec 22 12:18:27 trusty-corosync corosync[3910]: [TOTEM ] aru 0 high delivered 0 received flag 1
Dec 22 12:18:27 trusty-corosync corosync[3910]: [TOTEM ] Did not need to originate any messages in recovery.
Dec 22 12:18:27 trusty-corosync corosync[3910]: [TOTEM ] got commit token
Dec 22 12:18:27 trusty-corosync corosync[3910]: [TOTEM ] Sending initial ORF token
Dec 22 12:18:27 trusty-corosync corosync[3910]: [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 0, aru 0
Dec 22 12:18:27 trusty-corosync corosync[3910]: [TOTEM ] install seq 0 aru 0 high seq received 0
Dec 22 12:18:27 trusty-corosync corosync[3910]: [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 1, aru 0
Dec 22 12:18:27 trusty-corosync corosync[3910]: [TOTEM ] install seq 0 aru 0 high seq received 0
Dec 22 12:18:27 trusty-corosync corosync[3910]: [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 2, aru 0
Dec 22 12:18:27 trusty-corosync corosync[3910]: [TOTEM ] install seq 0 aru 0 high seq received 0
Dec 22 12:18:27 trusty-corosync corosync[3910]: [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 3, aru 0
Dec 22 12:18:27 trusty-corosync corosync[3910]: [TOTEM ] install seq 0 aru 0 high seq received 0
Dec 22 12:18:27 trusty-corosync corosync[3910]: [TOTEM ] retrans flag count 4 token aru 0 install seq 0 aru 0 0
Dec 22 12:18:27 trusty-corosync corosync[3910]: [TOTEM ] Resetting old ring state
Dec 22 12:18:27 trusty-corosync corosync[3910]: [TOTEM ] recovery to regular 1-0
Dec 22 12:18:27 trusty-corosync corosync[3910]: [TOTEM ] waiting_trans_ack changed to 1
Dec 22 12:18:27 trusty-corosync corosync[3910]: [MAIN ] Member joined: r(0) ip(169.254.241.20)
Dec 22 12:18:27 trusty-corosync corosync[3910]: [TOTEM ] entering OPERATIONAL state....

Read more...

Victor Tapia (vtapia)
tags: added: verification-done verification-done-trusty verification-done-xenial
removed: verification-needed verification-needed-trusty verification-needed-xenial
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package corosync - 2.3.5-3ubuntu2

---------------
corosync (2.3.5-3ubuntu2) xenial; urgency=medium

  * d/p/Parser-Make-config-file-parser-more-hierarchy.patch: Fixes how
    corosync parses a config file with malformed entries (LP: #1739033).

 -- Victor Tapia <email address hidden> Wed, 20 Dec 2017 12:37:52 +0100

Changed in corosync (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for corosync has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package corosync - 2.3.3-1ubuntu4

---------------
corosync (2.3.3-1ubuntu4) trusty; urgency=medium

  * d/p/Parser-Make-config-file-parser-more-hierarchy.patch: Fixes how
    corosync parses a config file with malformed entries (LP: #1739033).

 -- Victor Tapia <email address hidden> Wed, 20 Dec 2017 12:39:54 +0100

Changed in corosync (Ubuntu Trusty):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.