Trailing dot in fqdn of ps5 VMs regresses some tests (e.g. postfix)

Bug #2019472 reported by Paride Legovini
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Auto Package Testing
Fix Released
Undecided
Paride Legovini

Bug Description

On ps5 amd64 VMs (lcy02) we have that:

$ hostname --fqdn
adt-paride-mantic-postfix.openstack.prodstack5.lan.

Note the trailing dot. This causes regressions in some packages, see for example LP: #2019195. Ideally we should fix the regressed packages (the trailing dot is technically correct), but it makes sense to workaround the problem while the packages are getting fixed. Some possibilities are:

(1) Remove `manage_etc_hosts: true` from the cloud-init user-data used by the create-nova-image-new-release script. This should prevent the hostname to be set to the name advertised by openstack, and left to the static string 'autopkgtest'. I did some git archeology and manage_etc_hosts initially came from this snippet:

# unbreak my server option :-(
userdata=`mktemp`
trap "rm $userdata" EXIT TERM INT QUIT PIPE
/bin/echo -e "#cloud-config\nmanage_etc_hosts: True" > $userdata

and that was the *only* use of user-data back then. Looks like it was a workaround for some issue? In any case I can't be sure that disabling manage_etc_hosts won't cause other issues.

(2) Remove the trailing dot in /etc/hosts using the setup-canonical.sh script, e.g.

  sed -Ei '/^127\.0\.1\.1 /s/([a-z])\. /\1 /' /etc/hosts

I don't think this can be racey with cloud-init reconfiguring the hostname because testbed-setup removes cloud-init. Cons of this approach: very hacky. We ask cloud-init to manage /etc/hosts (via manage_etc_hosts), but then mangle it manually (not nice).

Related branches

Paride Legovini (paride)
description: updated
Revision history for this message
Paride Legovini (paride) wrote :

Postfix is not trivial to fix, see discussion on:

https://bugs.launchpad.net/ubuntu/+source/postfix/+bug/2019195

As discussed we're moving ahead with solution (2).

Revision history for this message
Paride Legovini (paride) wrote :

We now need to update the charm.

Changed in auto-package-testing:
status: New → Fix Committed
Paride Legovini (paride)
Changed in auto-package-testing:
assignee: nobody → Paride Legovini (paride)
Revision history for this message
Brian Murray (brian-murray) wrote :

This has landed in production now.

Changed in auto-package-testing:
status: Fix Committed → Fix Released
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I'm still seeing amavisd-new autopkgtests failing on hosts where the hostname has a trailing dot, like on this run from April 12th, 2024[1]:

889s Running newaliases
889s newaliases: warning: valid_hostname: misplaced delimiter: adt-noble-amd64-amavisd-new-20240412-035435-juju-7f2275-prod-pr.openstack.prodstack5.lan.
889s newaliases: fatal: file /etc/postfix/main.cf: parameter myhostname: bad parameter value: adt-noble-amd64-amavisd-new-20240412-035435-juju-7f2275-prod-pr.openstack.prodstack5.lan.
889s dpkg: error processing package postfix (--configure):

1. https://autopkgtest.ubuntu.com/results/autopkgtest-noble/noble/amd64/a/amavisd-new/20240412_040959_d1c5f@/log.gz

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :
Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :
Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :
Paride Legovini (paride)
Changed in auto-package-testing:
status: Fix Released → Incomplete
Revision history for this message
Paride Legovini (paride) wrote :

I clearly see the past failures, but I can't reproduce the issue anymore, neither manually (on the autopkgtest infra itself), nor using the service "normally". See for example how I triggered some mime-construct runs and they passed:

https://autopkgtest.ubuntu.com/packages/mime-construct/noble/amd64

NEWS:

I spotted a difference between "bad" and "good" runs. Bad runs appear to have this step in the log:

  autopkgtest [05:03:11]: rebooting testbed after setup commands that affected boot

while "good" news do not appear to have it. So maybe something is resetting the hostname on reboot? I tried manually and it didn't happen, but I still think this is a good clue to follow.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Who is adding the trailing dot to the fqdn? The DNS server? Can we tackle this at the source perhaps?

Revision history for this message
Paride Legovini (paride) wrote :

I still can't reproduce the issue, but I have some more data points. Taking [1] as an example of a "bad" run, we can see that:

 - /etc/hosts is mangled via sed (not visible from the logs)
 - hostname (the package) and cloud-init get upgraded
 - the system reboots

*something* in caused the `sed` to not have effect (/etc/hosts got rewritten? by what? the cloud-init upgrade caused some sort of race?). I tried downgrading the packages, but still couldn't reproduce the problem.

Maybe the `sed` should be moved to autopkgtest --setup-commands-boot.

For reference, [2] is the postfix bug about it not being to handle the trailing dot.

[1] https://autopkgtest.ubuntu.com/results/autopkgtest-noble/noble/amd64/m/mime-construct/20240412_050518_cab97@/log.gz
[2] https://bugs.launchpad.net/ubuntu/+source/postfix/+bug/2019195

Revision history for this message
Bryce Harrington (bryce) wrote :

Is currently preventing logcheck from passing autopkgtest when triggered against postfix:

https://autopkgtest.ubuntu.com/packages/l/logcheck/noble/amd64

1186s Postfix (main.cf) is now set up with a default configuration. If you need to
1186s make changes, edit /etc/postfix/main.cf (and others) as needed. To view
1186s Postfix configuration values, see postconf(1).
1186s
1186s After modifying main.cf, be sure to run 'systemctl reload postfix'.
1186s
1188s Running newaliases
1188s newaliases: warning: valid_hostname: misplaced delimiter: adt-noble-amd64-logcheck-20240417-191038-juju-7f2275-prod-propo.openstack.prodstack5.lan.
1188s newaliases: fatal: file /etc/postfix/main.cf: parameter myhostname: bad parameter value: adt-noble-amd64-logcheck-20240417-191038-juju-7f2275-prod-propo.openstack.prodstack5.lan.
1188s dpkg: error processing package postfix (--configure):

Paride Legovini (paride)
Changed in auto-package-testing:
status: Incomplete → In Progress
Revision history for this message
Paride Legovini (paride) wrote :

We also need to disable the update_etc_hosts module, otherwise the /etc/hosts contents are reset.

https://code.launchpad.net/~paride/autopkgtest-cloud/+git/autopkgtest-cloud/+merge/464614

Notes:

* The hostname and cloud-init upgrades I mentioned in an earlier comment were red herrings.

* I tried telling cloud-init to leave /etc/hosts alone by setting `manage_etc_hosts: false` in `/etc/cloud/cloud.cfg`, but that didn't work. The user-data we get from ps5 has an explicit `manage_etc_hosts: true`, so I guess the user-data can override cloud.cfg.

* I really struggled reproducing the issue. Maybe it's racy (sometimes cloud-init didn't get to update /etc/hosts in time, and we managed to install postfix before that)? But this would be a really slow race. Or maybe there are details of the ps5 infrastructure that I'm missing (different endpoints serving slightly different user-data? Or: user-data settings being changed and then rolled back, so I was trying to reproduce on an unstable ground?). I don't know.

In general I think disabling the module makes sense.

Revision history for this message
Paride Legovini (paride) wrote :

The fix is confirmed working, I'm marking this as Fix Released again.

Changed in auto-package-testing:
status: In Progress → Fix Released
Revision history for this message
Skia (hyask) wrote :

The fix for this issue was actually changed again for a cleaner one, that is also confirmed to work, and avoids breaking other stuff, like `sudo -n /bin/true` on Xenial outputting `unable to resolve host autopkgtest` on stderr and thus breaking autopkgtest itself.

Here is the autopkgtest part of the new (hopefully) definitive fix: https://salsa.debian.org/ci-team/autopkgtest/-/merge_requests/334
And here is the autopkgtest-cloud part: https://code.launchpad.net/~hyask/autopkgtest-cloud/+git/autopkgtest-cloud/+merge/464979

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.