fence_azure_arm fails with ServicePrincipalCredentials' object has no attribute 'get_token'

Bug #1990316 reported by charles
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
fence-agents (Ubuntu)
Fix Released
Undecided
Robie Basak
Jammy
Fix Released
Undecided
Robie Basak
Kinetic
Fix Released
Undecided
Unassigned

Bug Description

[Impact]

The Azure-specific fence agent fence_azure_arm does not work at all on Jammy. It's needed for HA fencing on Azure.

[Test Plan A]

Follow Steps 1 through 3 for Focal at https://discourse.ubuntu.com/t/ubuntu-ha-ms-sql-server-on-azure/27673 but use Jammy instead. Installing SQL Server isn't relevant for this bug on Jammy - that's not available for Jammy yet. We're fixing fencing first which is a prerequisite.

Once the fence agent resource is configured, "sudo crm status" should produce output as described in the tutorial. Then run "iptables -I INPUT -j DROP" on vm2. This will of course lock you out, but shortly afterwards, vm2 get rebooted automatically by one of the other nodes, as can be verified by seeing the contents of /proc/sys/kernel/random/boot_id change.

In the failure case, "sudo crm status" will display output like the following, and of course no reboot occurs:

Node List:
  * Node myVM2: UNCLEAN (offline)
  * Online: [ myVM1 myVM3 ]

Full List of Resources:
  * fence-vm (stonith:fence_azure_arm): Stopped

Failed Resource Actions:
  * fence-vm start on myVM1 returned 'error' at Thu Nov 17 14:48:04 2022 after 3.388s
  * fence-vm start on myVM3 returned 'error' at Thu Nov 17 14:48:11 2022 after 2.193s

Failed Fencing Actions:
  * reboot of myVM2 failed: delegate=, client=pacemaker-controld.4281, origin=myVM1, last-failed='1970-01-01 00:12:46Z'

[Test Plan B]

Instead of configuring pacemaker with the fence agent and causing it to detect and take the fencing action by using iptables to fake a loss of network connectivity, it's possible to run the fence agent manually as follows:

fence_azure_arm --action=reboot --plug=myVM2 --resourceGroup="$resource_group" --login="$secret_id" --username="$application_id" --password="$password" --tenantId="$tenant_id" --subscriptionId="$subscription_id" --login-timeout=5 --power-timeout=60 --verbose < /dev/null

This prints lots of debug output, exits 0 and should cause a reboot of vm2. In the failure case, it exits non-zero with "AttributeError: 'ServicePrincipalCredentials' object has no attribute 'get_token'. Did you mean: 'set_token'?" and no reboot occurs.

[Where problems could occur]

We're changing code that is only used by Azure-specific fence agents, which we believe is fence_azure_arm only.

There could be some way that users are successfully using fence_azure_arm today that we don't know about, and this change could then regress these users. But as far as we're aware, the requirement to use the new API is a hard requirement for this version of python3-azure.

There could be a further bug in an area not exercised by the Test Plan which would be harder to fix in a further SRU without affecting existing users, after this SRU makes it work.

There's the usual "rebuild risk" that could cause a behaviour change in an area of code we haven't touched.

[Other Info]

There seems to be a different bug on Lunar that prevents Test Plan A from succeeding. However, Test Plan B passes, verifying that this bug is fixed on Lunar. The other bug appears to be in crmsh and I'll tackle that separately.

Based on the presence of the required changes, this bug is presumed fixed on Kinetic, but it doesn't seem important to spend time verifying this on Kinetic for this bug since I don't expect anyone to be using Kinetic on Azure with HA.

I've built this proposed SRU in ppa:racb/experimental and verified that both Test Plan A and Test Plan B succeed against it, whereas they both fail without the PPA. For SRU verification I also expect both Test Plans to succeed and will verify both.

[Original Description]

ubuntu server 22.04

installed python3-azure and fence-agents*

stonith device will not start. when running fence_azure_arm fails with this error-

# fence_azure_arm -l xxxx -p "xxxx" --resourceGroup=xxxx --tenantId=xxxxx --subscriptionId=xxxx -n xxxx
2022-09-20 16:22:14,713 ERROR: Failed: 'ServicePrincipalCredentials' object has no attribute 'get_token'

ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: fence-agents 4.7.1-1ubuntu8
ProcVersionSignature: Ubuntu 5.15.0-1019.24-azure 5.15.46
Uname: Linux 5.15.0-1019-azure x86_64
ApportVersion: 2.20.11-0ubuntu82.1
Architecture: amd64
CasperMD5CheckResult: unknown
Date: Tue Sep 20 16:49:42 2022
PackageArchitecture: all
ProcEnviron:
 SHELL=/bin/bash
 LANG=C.UTF-8
 TERM=xterm-256color
 XDG_RUNTIME_DIR=<set>
 PATH=(custom, no user)
SourcePackage: fence-agents
UpgradeStatus: No upgrade log present (probably fresh install)

Related branches

Revision history for this message
charles (3-charles) wrote :
Revision history for this message
charles (3-charles) wrote :

I was able to get it working by installing fence-agents-4.11 from Debian.
http://http.us.debian.org/debian/pool/main/f/fence-agents/fence-agents_4.11.0-1+b1_amd64.deb

seems included version 4.7 is not compatible with the included python3-azure SDK

description: updated
description: updated
Robie Basak (racb)
tags: added: server-todo
Changed in fence-agents (Ubuntu):
assignee: nobody → Robie Basak (racb)
Revision history for this message
Robie Basak (racb) wrote :

Here's the Traceback (after unhiding it from a catch-all exception):

Traceback (most recent call last):
  File "/usr/sbin/fence_azure_arm", line 252, in <module>
    main()
  File "/usr/sbin/fence_azure_arm", line 248, in main
    result = fence_action([compute_client, network_client], options, set_power_status, get_power_status, get_nodes_list)
  File "/usr/share/fence/fencing.py", line 943, in fence_action
    status = get_multi_power_fn(connection, options, get_power_fn)
  File "/usr/share/fence/fencing.py", line 807, in get_multi_power_fn
    plug_status = get_power_fn(connection, options)
  File "/usr/sbin/fence_azure_arm", line 77, in get_power_status
    vmStatus = compute_client.virtual_machines.get(rgName, vmName, "instanceView")
  File "/usr/lib/python3/dist-packages/azure/mgmt/compute/v2021_07_01/operations/_virtual_machines_operations.py", line 675, in get
    pipeline_response = self._client._pipeline.run(request, stream=False, **kwargs)
  File "/usr/lib/python3/dist-packages/azure/core/pipeline/_base.py", line 211, in run
    return first_node.send(pipeline_request) # type: ignore
  File "/usr/lib/python3/dist-packages/azure/core/pipeline/_base.py", line 71, in send
    response = self.next.send(request)
  File "/usr/lib/python3/dist-packages/azure/core/pipeline/_base.py", line 71, in send
    response = self.next.send(request)
  File "/usr/lib/python3/dist-packages/azure/core/pipeline/_base.py", line 71, in send
    response = self.next.send(request)
  [Previous line repeated 2 more times]
  File "/usr/lib/python3/dist-packages/azure/mgmt/core/policies/_base.py", line 47, in send
    response = self.next.send(request)
  File "/usr/lib/python3/dist-packages/azure/core/pipeline/policies/_redirect.py", line 158, in send
    response = self.next.send(request)
  File "/usr/lib/python3/dist-packages/azure/core/pipeline/policies/_retry.py", line 445, in send
    response = self.next.send(request)
  File "/usr/lib/python3/dist-packages/azure/core/pipeline/policies/_authentication.py", line 117, in send
    self.on_request(request)
  File "/usr/lib/python3/dist-packages/azure/core/pipeline/policies/_authentication.py", line 94, in on_request
    self._token = self._credential.get_token(*self._scopes)
AttributeError: 'ServicePrincipalCredentials' object has no attribute 'get_token'. Did you mean: 'set_token'?

Revision history for this message
Robie Basak (racb) wrote :

Confirmed that this works correctly on Lunar, and reproduces on Jammy. I've not tested Kinetic.

Changed in fence-agents (Ubuntu):
status: New → Fix Released
Changed in fence-agents (Ubuntu Jammy):
status: New → In Progress
assignee: nobody → Robie Basak (racb)
Revision history for this message
Robie Basak (racb) wrote :

This looks like it was fixed in 4.11.0-1. Kinetic is a higher version than that. I haven't tested Kinetic specifically since it seems unlikely that there will be users attempting HA on Azure on Kinetic, but it should work there anyway since it includes the fix, so I'm marking it Fix Released for Kinetic.

Changed in fence-agents (Ubuntu Kinetic):
status: New → Fix Released
Robie Basak (racb)
description: updated
Revision history for this message
Andreas Hasenack (ahasenack) wrote : Please test proposed package

Hello charles, or anyone else affected,

Accepted fence-agents into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/fence-agents/4.7.1-1ubuntu8.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in fence-agents (Ubuntu Jammy):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-jammy
Revision history for this message
charles (3-charles) wrote :

Hello Robie and Andreas,

I have installed the proposed fix and confirm it is working as expected. The fence agent now starts correctly and fencing works when a node fails.

Thank you!

Lena Voytek (lvoytek)
tags: added: verification-done verification-done-jammy
removed: verification-needed verification-needed-jammy
Revision history for this message
Robie Basak (racb) wrote :

Thank you for the testing charles!

To double check, I followed the specific agreed test plans documented above. Both Test Plan A and Test Plan B fail using fence-agents 4.7.1-1ubuntu8, and pass using fence-agents 4.7.1-1ubuntu8.1 from jammy-proposed as expected.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package fence-agents - 4.7.1-1ubuntu8.1

---------------
fence-agents (4.7.1-1ubuntu8.1) jammy; urgency=medium

  * Update the Azure fencing agent (fence_azure_arm) to support Azure
    SDK >=15, since this is required by python3-azure in Jammy and this
    fence agent doesn't work without this update (LP: #1990316).

 -- Robie Basak <email address hidden> Wed, 16 Nov 2022 12:11:57 +0000

Changed in fence-agents (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Andreas Hasenack (ahasenack) wrote : Update Released

The verification of the Stable Release Update for fence-agents has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.