Comment 6 for bug 1817484

Revision history for this message
Vern Hart (vern) wrote : Re: [2.5] MAAS does not guarantee DNS dns record updates propagation and returns 200 prematurely

Andres,

I'll let Dimitrii respond with the logs from the other region server. Note that one is down so there will only be two sets of logs.

The scenario is part of a failure test for a customer. The three MAAS nodes are on different subnets so are using dns-ha instead of a vip. MAAS is access with a hostname, maas-region.test, that gets moved by pacemaker when the node with the name fails.

To be clear, this is a non-clean loss of one server -- the postgres master.

So a specific scenario is (I might have the hosts different than Dima's reports):

Postgresql master is maas-vhost1
Postgresql slave is maas-vhost2
maas-region.test is on maas-vhost1

1. Deprive power to maas-vhost1
2. Pacemaker promotes maas-vhost2 to pgsql master and makes maas-vhost3 the new slave
3. Pacemaker moves (starts) the hostname on maas-vhost2 (or 3)

Pacemaker is not monitoring regiond so it would not be restarting those services. They have to wait for the database to be available again.

At step 3, the first one or two starts may fail (500 server error) as the new master starts up and becomes available to MAAS. But eventually pacemaker succeeds in running the script that talks to MAAS to assign the new host's IP to maas-region.test. When this happens, querying MAAS (or pgsql directly) shows the new IP but querying DNS (or the zone files directly) shows the old IP address. This holds true no matter which MAAS server we're looking at.