Redeploys to same model fail

Bug #1903625 reported by Stuart Bishop
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
charm-k8s-postgresql
Fix Released
Medium
Unassigned

Bug Description

Per https://bugs.launchpad.net/juju/+bug/1903623 , the mechanism used by the charm to map unit names to pod names does not always work. In particular, if you remove a PostgreSQL deployment and redeploy it to the same model with the same name, it will fail.

Related branches

Revision history for this message
Stuart Bishop (stub) wrote :

Likely best fixed by implementing lp:1904821

Revision history for this message
Andre Ruiz (andre-ruiz) wrote :

I just had this problem. In the logs you can see "I'm not the master, cloning the master" (eventually it times out).

Seems like the code around https://git.launchpad.net/charm-k8s-postgresql/tree/files/pgcharm.py#n466 is returning a leader when it should not.

Revision history for this message
Mariyan Dimitrov (merkata) wrote :

The issue is a line lower actually, sitting at https://git.launchpad.net/charm-k8s-postgresql/tree/files/pgcharm.py#n467.

This checks that the current unit is the first out of all units that are part of the application, but there are some caveats to that:

JUJU_UNIT_NAME is not inferred from an env variable that is set by Juju, rather it is (awkwardly) constructed by joining application name and pod name, this relying on two APIs, one of Juju and one of k8s at https://git.launchpad.net/charm-k8s-postgresql/tree/files/pgcharm.py#n52

JUJU_EXPECTED_UNITS is constructed via querying the Juju API only and returning a sorted list of units.

Initially, you would get a JUJU_UNIT_NAME that equals JUJU_EXPECTED_UNITS[0] (they are /0). With every new redeployment and revision, you will drift and this line won't match.

Every time you compare JUJU_UNIT_NAME with JUJU_EXPECTED_UNITS, you are comparing a unit that is carrying the number of a pod name, and as every application is deployed as a StatefulSet, every pod will start at 0 and increment. For the expected units, they will increment from the number of the last revision.

There are two things to consider when fixing this:

- construct JUJU_UNIT_NAME properly, so that an actual unit is returned (done currently via calling hookenv.local_unit())

- ensure no race conditions occur and handle master election with spinning up pods serially via the "service": {"scalePolicy": "serial"} in the pod spec

Tom Haddon (mthaddon)
Changed in charm-k8s-postgresql:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Tom Haddon (mthaddon) wrote :

This has been fixed in revno 20, released to the stable channel.

Changed in charm-k8s-postgresql:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.