Hi Stuart, Thanks for the suggestions here about the upgrade-charm lifecycle. What I ended up doing yesterday was the following to ease the upgrade process upgrade_charm hook will: 1. stop postgresql; 2. umount the /srv/juju/vol-id; 3. unlink /var/lib/postgresql/9.1/main; 4. mount LABEL=vol-id /srv/data; 5. symlink /var/lib/postgresql/9.1/main -> /src/data/postgresql/9.1/main 6. chown -h postgres:postgresql /var/lib/postgresql/9.1/main 7. log a warning about manual procedure to add storage and block-storage-broker services to ensure ongoing use of existing storage.volume_map that looks like the following: WARNING: postgresql/0 unit has external volume id 123-123-123-123 mounted via the deprecated volume-map and volume-ephemeral-storage configuration parameters. These parameters are no longer available in the postgresql charm in favor of using the volume_map parameter in the storage subordinate charm. We are migrating the attached volume to a mount path which can be managed by the storage subordinate charm. To continue using this volume_id with the storage subordinate follow this procedure. ----------------------------------- 1. cat > storage.cfg < > This sounds reasonable. Assuming the directory structures match, no > large database files need to be shuffled around. I am unsure if the > last task in step 5 will actually work, Good point Stuart, as I found too, it didn't work. Even with the resolved --retry, it would still fail at the following config_changed hook that fires after the upgrade_charm hook because I had originally left PG in an unusable state (with the intent of allowing the storage subordinate relation to setup the mounts again). So, I've made sure all mount/link work is now done in the upgrade_charm hook and the postgresql service is still running after remounting the device in the proper location for the storage subordinate to handle it. We still will exit(1) to alert the juju admin to the manual juju deploy & add-relation steps needed for ongoing volume support. But, in this new approach, the service still has integrity, and a resolved --retry will succeed (as well as the following config_changed hook). > After the operator has performed the tasks given by step 5, we have a PostgreSQL unit in a > failed state and a subordinate charm with pending relation-hooks. Will > 'juju resolved' or 'juju resolved --retry' be all that is required? As you pointed out before that process was not quite working, it had me exercise the upgrade path and sort out the kinks. Now if the steps are followed, even after step 0 (juju upgrade-charm postgresql) postgresql service is still running and the volume is remounted in a path on the unit that is compatible with the storage subordinate. The juju --resolved --retry postgresql/0 (in step 5) is essentially a noop because that upgrade_charm method will not attempt to move mount locations if the link_target of /var/lib/postgresql/9.1/main does not contain /srv/juju (it contains /srv/data the new mount point). The resoloved though will clear the hook failure and allow the add-relation hooks to fire in step 6. > An alternative would be to not have upgrade-charm fail, but instead > set a flag ensuring that PG will not start up until the subordinate > storage charm has been successfully related. Disable autostart, > local_state could store the flag, and postgrersql_start would check > for it. It might not be this simple though, as other hooks like > config-changed would likely fail Agreed, yeah that approach did have more impact than just having upgrade_charm do the appropriate work to ensure postgresql could continue running in a state unmaintained by the storage subordinate. At least with the current approach I implemented above, PG is still running and will continue running even if the admin forgets to apply the manual steps to ensure the storage subordinate maintains this volume long term. > And another alternative.... upgrade-charm just leaves the storage > mounted and the storage hooks fix up the situation (either by copying > the files to the new mount and leaving the old mount dangling for the > operator to clean up, or detecting the situation and remounting the > existing storage correctly). This was your suggestion that I most following, remount, relink and restart so storage can take the reins when it becomes related. Thanks again for the good thoughts.