MAAS detects 0 cores, RAM available for KVM host, reports negative availability on pod compose

Bug #1925249 reported by Michael Skalka
34
This bug affects 5 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Alberto Donato
2.9
Won't Fix
Undecided
Unassigned
maas-ui
Fix Released
Unknown

Bug Description

On MAAS 2.9.2, I have a KVM host with 32 cores and 64GiB of memory:

ubuntu@tyrogue:~$ cat /proc/cpuinfo | grep processor | wc -l
32
ubuntu@tyrogue:~$ free -h
              total used free shared buff/cache available
Mem: 62Gi 23Gi 1.8Gi 148Mi 37Gi 38Gi
Swap: 0Gi 0Mi 0Gi

That is also being used as the MAAS controller. The controller page reports the correct information, 32 cores, 64 GiB of memory. However the KVM tab shows a different story:

18 of 0 allocated
55834574848 of 0 B allocated
240 of 490 GB allocated

This blocks the creation of additional VMs. Restarting the MAAS snap on this host does not restore functionality.

Tags: ui

Related branches

Revision history for this message
Michael Skalka (mskalka) wrote :

sosreport

Revision history for this message
Michael Skalka (mskalka) wrote :

Screencap of current VMS

Revision history for this message
Michael Skalka (mskalka) wrote :

Screencap of "overcommit"

Revision history for this message
Michael Skalka (mskalka) wrote :

Subbing field high, this is blocking upgrade testing in the SQA lab

Revision history for this message
Michael Skalka (mskalka) wrote :

Update: changing the CPU over-commit ratio from 10 to 9.9 did *something* to cause maas to reevaluate the number of available cores and recognize that there were in fact 32 cores on the system.

Alberto Donato (ack)
tags: added: ui
Changed in maas-ui:
importance: Undecided → Unknown
Revision history for this message
Caleb Ellis (caleb-ellis) wrote :

I'm not certain the issue is in the UI, which in 2.9.2 uses the `pod.used`, `pod.free` and `pod.total` values returned from the websocket. FWIW I've only noticed these weird negative and 0 values being reported for virsh KVMs and not LXD. Refreshing seems to fix the numbers temporarily, but changing the over-commit ratios seems to have fixed it "permanently".

Revision history for this message
Narinder Gupta (narindergupta) wrote :

After we see this issue we can not compose the machines so somewhere it is having similar values in MAAS as well. I tried to change the over-commit ratio as it changes only temporary and revert back to -ve value again.

Revision history for this message
Alberto Donato (ack) wrote :

@Caleb is the UI now using info under "resources" for virsh pods too?

I can reproduce this with master, where memory/cpu shows 0 available, while storage is correct.

Looking at the websocket data, I see that total/used/available are correct, while "resources" is empty, which is expected since we collect those only for LXD-based vmhosts:

{
 ...
  "cpu_over_commit_ratio": 1,
  "memory_over_commit_ratio": 1,
 ...
  "type": "virsh",
  "total": {
   "cores": 4,
   "memory": 15860,
   "memory_gb": "15.5",
   "local_storage": 489479471104,
   "local_storage_gb": "455.9"
  },
  "used": {
   "cores": 0,
   "memory": 0,
   "memory_gb": "0.0",
   "local_storage": 0,
   "local_storage_gb": "0.0"
  },
  "available": {
   "cores": 4,
   "memory": 15860,
   "memory_gb": "15.5",
   "local_storage": 489479471104,
   "local_storage_gb": "455.9"
  },
  "composed_machines_count": 0,
  "owners_count": 0,
  "hints": {
   "cores": 4,
   "cpu_speed": 2800,
   "memory": 15860,
   "memory_gb": "15.5",
   "local_storage": 201947750400,
   "local_storage_gb": "188.1"
  },
  "storage_pools": [
   {
    "id": "ee6f53a3-3b12-4f64-a4cf-3c2127dc900a",
    "name": "maas",
    "type": "dir",
    "path": "/var/lib/libvirt/maas-images",
    "total": 489479471104,
    "used": 0,
    "available": 489479471104
   }
  ],
...
  "resources": {
   "cores": {
    "allocated_tracked": 0,
    "allocated_other": 0,
    "free": 0
   },
   "memory": {
    "hugepages": {
     "allocated_tracked": 0,
     "allocated_other": 0,
     "free": 0
    },
    "general": {
     "allocated_tracked": 0,
     "allocated_other": 0,
     "free": 0
    }
   },
   "storage": {
    "allocated_tracked": 0,
    "allocated_other": 0,
    "free": 0
   },
   "vm_count": {
    "tracked": 0,
    "other": 0
   },
   "interfaces": [],
   "vms": [],
   "numa": []
  },
...
}

Changed in maas:
milestone: none → 3.0.0-rc1
importance: Undecided → High
status: New → Triaged
Alberto Donato (ack)
Changed in maas:
assignee: nobody → Alberto Donato (ack)
status: Triaged → In Progress
Revision history for this message
Alberto Donato (ack) wrote :

I tested on a 2.9.2 deployment, I can't reproduce the issue.

Changed in maas:
status: In Progress → Fix Committed
Changed in maas-ui:
status: New → Fix Released
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.