"all" mode snap upgrade from 2.7 to 2.8 fails

Bug #1938321 reported by Alberto Donato
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Critical
Alberto Donato
snapd
Invalid
Undecided
Unassigned

Bug Description

This can be reproduced easily by installing maas 2.7 in all mode and trying to ugprade to 2.8

snap install maas --channel=2.7
maas init --mode all

root@f2:~# snap list maas
Name Version Rev Tracking Publisher Notes
maas 2.7.3-8290-g.ebe2b9884 8724 2.7/stable canonical✓ -
root@f2:~# snap refresh maas --channel=2.8
error: cannot perform the following tasks:
- Run post-refresh hook of "maas" snap if present (run hook "post-refresh": Can't refresh snap in 'all' mode while disabled.)
root@f2:~# snap refresh maas --channel=2.8
error: cannot perform the following tasks:
- Run post-refresh hook of "maas" snap if present (run hook "post-refresh": mv: cannot move '/var/snap/maas/13516/var/lib/maas' to '/var/snap/maas/common/maas': Directory not empty)

Related branches

Alberto Donato (ack)
Changed in maas:
assignee: nobody → Alberto Donato (ack)
Revision history for this message
Alberto Donato (ack) wrote :

The issue is due to this check in maas' post-refresh hook:

    if snapctl services "$service" | grep -q disabled; then
        echo "Can't refresh snap in 'all' mode while disabled." >&2
        exit 1

During a refresh, the service is now reported as "disabled", which shouldn't be the case, as it should only be inactive during the upgrade:

+ snapctl services maas.supervisor
Service Startup Current Notes
maas.supervisor disabled inactive -

MAAS checks the service status as if maas is updated while the service is disabled, postgres won't be running, so it's wouldn't be possible to run migrations.

It seems like something might have changed in snapd that's now reporting the service as disabled.

Revision history for this message
Alberto Donato (ack) wrote (last edit ):

Tested with snapd 2.51.1 and 2.51.3 (current in latest/stable)

Revision history for this message
Paweł Stołowski (stolowski) wrote :

Reproduced with "snap install maas --channel=2.7; maas init --mode all; snap refresh maas --channel=2.8", but as far as I can tell it's been working like this for 2-3 years if not more, so I'm confused it didn't fail before (unless the "$(maas_snap_mode)" = "all" case in maas post-refresh hook was not hit before?)

For the record, the failing change on refresh looks like this (hook error output omitted):

Done today at 11:33 UTC today at 11:33 UTC Ensure prerequisites for "maas" are available
Undone today at 11:33 UTC today at 11:33 UTC Download snap "maas" (13516) from channel "2.8"
Done today at 11:33 UTC today at 11:33 UTC Fetch and check assertions for snap "maas" (13516)
Undone today at 11:33 UTC today at 11:33 UTC Mount snap "maas" (13516)
Undone today at 11:33 UTC today at 11:33 UTC Run pre-refresh hook of "maas" snap if present
Undone today at 11:33 UTC today at 11:33 UTC Stop snap "maas" services
Undone today at 11:33 UTC today at 11:33 UTC Remove aliases for snap "maas"
Undone today at 11:33 UTC today at 11:33 UTC Make current revision for snap "maas" unavailable
Undone today at 11:33 UTC today at 11:33 UTC Copy snap "maas" data
Undone today at 11:33 UTC today at 11:33 UTC Setup snap "maas" (13516) security profiles
Undone today at 11:33 UTC today at 11:33 UTC Make snap "maas" (13516) available to the system
Undone today at 11:33 UTC today at 11:33 UTC Automatically connect eligible plugs and slots of snap "maas"
Undone today at 11:33 UTC today at 11:33 UTC Set automatic aliases for snap "maas"
Undone today at 11:33 UTC today at 11:33 UTC Setup snap "maas" aliases
Error today at 11:33 UTC today at 11:33 UTC Run post-refresh hook of "maas" snap if present
Hold today at 11:33 UTC today at 11:33 UTC Start snap "maas" (13516) services
Hold today at 11:33 UTC today at 11:33 UTC Clean up "maas" (13516) install
Hold today at 11:33 UTC today at 11:33 UTC Run configure hook of "maas" snap if present
Hold today at 11:33 UTC today at 11:33 UTC Run health check of "maas" snap
Done today at 11:33 UTC today at 11:33 UTC Handling re-refresh of "maas" as needed

The critical moment is on "Make current revision for snap "maas" unavailable" task: we unlink the old snap revision and remove wrappers for all services (they are stopped earlier with 'Stop snap "maas" services'), and call systemctl daemon-reload. We then link the new version of the snap (Make snap "maas" (13516) available...) which also re-creates systemd units for the services, but nothing gets started yet; so for post-refresh hook the service appears inactive and disabled. The services would get started and enabled (unless explicitly disabled by the user) with 'Start snap "maas" (13516) services'.

Does this explanation make sense?

Alberto Donato (ack)
Changed in maas:
status: Triaged → In Progress
Revision history for this message
Paweł Stołowski (stolowski) wrote :

Small clarification: when the old revision of the snap gets unlinked and systemd units get removed, we also explicitly disable them before removal; all the above explanation remains true with regard to post-refresh hook and starting and enabling services afterwards.

Alberto Donato (ack)
Changed in maas:
status: In Progress → Fix Committed
Revision history for this message
Heather Lemon (hypothetical-lemon) wrote :

From focal lxc

snap install maas --channel=2.7
maas (2.7/stable) 2.7.3-8290-g.ebe2b9884 from Canonical✓ installed

sudo snap list
Name Version Rev Tracking Publisher Notes
maas 2.7.3-8290-g.ebe2b9884 8724 2.7/stable canonical✓ -
maas-cli 0.6.5 13 latest/stable canonical✓ -

executed command:

snap refresh maas --channel=2.8/edge
#output #completes successfully
maas (2.8/edge) 2.8.6-8609-g.d77f501ee from Canonical✓ refreshed

#print output
root@focal-systemd:~# sudo snap list maas
Name Version Rev Tracking Publisher Notes
maas 2.8.6-8609-g.d77f501ee 15848 2.8/edge canonical✓ -

Thanks,
Heather Lemon

Revision history for this message
Pedro Victor Lourenço Fragola (pedrovlf) wrote :

Using maas channel=2.8/edge for the upgrade was also successful during my tests.

Revision history for this message
Paweł Stołowski (stolowski) wrote :

Closing this bug in snapd project per my earlier explanation; it's my understanding the problem has been solved in maas snap.

Changed in snapd:
status: New → Invalid
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.