MAAS

Bug #1817484
Comment #5

Comment 5 for bug 1817484

Revision history for this message

Dmitrii Shcherbakov (dmitriis) wrote on 2019-02-25: Re: [2.5] MAAS does not guarantee DNS dns record updates propagation and returns 200 prematurely

Andres,

Logs from maas-vhost2 (current db master)
https://private-fileshare.canonical.com/~dima/maas-dumps/2019-02-25-maas-vhost2-etc-var-log.tar.gz

Logs from maas-vhost1 (failed master) - I used libguestfs to extract the logs from the killed machine (it is offline and you can see garbage at the and of the region log file there because obviously page cache was not dropped before the forced shutoff)
https://private-fileshare.canonical.com/~dima/maas-dumps/2019-02-25-maas-vhost1-etc-var-log.tar.gz

1. yes, it gets created at first (successfully), then a pacemaker resource agent updates it on failover of the node the record points to;

In this case, the record points to maas-vhost1 which also happens to be a postgres DB master

When maas-vhost1 is killed, the record is updated from a node where pacemaker decides to start the "dns record resource" (called res_maas_region_hostname) - from maas-vhost3 in this case.

2. DB fails over to maas-vhost2 and there is a resource constraint on res_maas_region_hostname such that pacemaker does not attempt to start it (on maas-vhost3 or maas-vhost2) until the DB VIP resource starts. And DB VIP only starts after a postgres master is available.

So: (1) db failover completes (2) DNS update is issued from maas-vhost3

https://pastebin.canonical.com/p/ZZKMvRWDQN/
order ord_promote inf: ms_pgsql:promote res_pgsql_vip:start symmetrical=false
order ord_hostname_start inf: res_pgsql_vip:start res_maas_region_hostname:start symmetrical=false

3. the DB contains the new record so it's notifications from postgres that somehow do not get delivered from the new master or MAAS does not process them correctly.

> do the region controllers get restarted

No, we do not restart them - we just expect that DB notifications will arrive causing a bind9 reload.