/var/snap/maas/common/maas/boot-resources/cache grows without bound

Bug #1947629 reported by Dan Streetman
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Medium
Alberto Donato

Bug Description

Our maas installation has run out of disk space due to the /var/snap/maas/common/maas/boot-resources/cache directory growing to 125G which resulted in the system running out of disk space.

The directory appears to contain files with long random character filenames that are never cleaned up; some have timestamps from over 8 months ago.

Tags: seg

Related branches

Revision history for this message
Dan Streetman (ddstreet) wrote :

We currently have the maas snap installed at version:

snap-id: shY22YTZ3RhJJDOj0MfmShTNZTEb1Jiq
tracking: 2.9/stable
refresh-date: 2021-04-16
installed: 2.9.2-9165-g.c3e7848d1 (12555) 149MB -

tags: added: seg
Revision history for this message
Alberto Donato (ack) wrote (last edit ):

MAAS has logic to cleanup the cache directory.
This looks for all files in the directory and removes the one that report a number of hard link = 1 (meaning they are not used anywhere else).

You can check this with
  stat -c '%h %n' /var/snap/maas/common/maas/boot-resources/cache/*

Could you please run that and see if any file reports 1?

Changed in maas:
status: New → Incomplete
Revision history for this message
Dan Streetman (ddstreet) wrote :

sorry, I had to remove the files so we could get maas working again, they're gone now.

Alberto Donato (ack)
Changed in maas:
status: Incomplete → New
status: New → Incomplete
Revision history for this message
Kellen Renshaw (krenshaw) wrote :
Download full text (5.0 KiB)

Looked into this, and after syncing/removing an image, there are 63 files with a refcount of 1 in the cache directory:
ubuntu@maas:/var/snap/maas/common/maas/boot-resources/cache$ stat -c '%h %n' /var/snap/maas/common/maas/boot-resources/cache/* | grep "^1 "
1 /var/snap/maas/common/maas/boot-resources/cache/04a517eab757896f7b2e665ead0ddf5af2be96c92be1398b38dba7e46da575b1
1 /var/snap/maas/common/maas/boot-resources/cache/057c96ed224e0c8fbc6af92a14c00c5d99314c69079ba1e8fe20f0ea3c65d589
...

Concurrent snap info:
snap-id: shY22YTZ3RhJJDOj0MfmShTNZTEb1Jiq
tracking: 2.9/stable
refresh-date: 2021-04-16
installed: 2.9.2-9165-g.c3e7848d1 (12555) 149MB -

Did an strace of the rackd process, and discovered that the cleanup logic is unable to delete at least one file. It does an unlinkat() that returns EACCESS, and the cleanup process throws the following exception:
2021-11-04 17:22:11 twisted.internet.defer: [critical] Unhandled error in Deferred:
2021-11-04 17:22:11 twisted.internet.defer: [critical]
        Traceback (most recent call last):
          File "/snap/maas/12555/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 501, in errback
            self._startRunCallbacks(fail)
          File "/snap/maas/12555/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 568, in _startRunCallbacks
            self._runCallbacks()
          File "/snap/maas/12555/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 654, in _runCallbacks
            current.result = callback(current.result, *args, **kw)
          File "/snap/maas/12555/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1475, in gotResult
            _inlineCallbacks(r, g, status)
        --- <exception caught here> ---
          File "/snap/maas/12555/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
            result = result.throwExceptionIntoGenerator(g)
          File "/snap/maas/12555/usr/lib/python3/dist-packages/twisted/python/failure.py", line 491, in throwExceptionIntoGenerator
            return g.throw(self.type, self.value, self.tb)
          File "/snap/maas/12555/lib/python3.8/site-packages/provisioningserver/rpc/boot_images.py", line 153, in _import_boot_images
            yield deferToThread(_run_import, sources, maas_url, **proxies)
          File "/snap/maas/12555/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 250, in inContext
            result = inContext.theWork()
          File "/snap/maas/12555/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 266, in <lambda>
            inContext.theWork = lambda: context.call(ctx, func, *args, **kw)
          File "/snap/maas/12555/usr/lib/python3/dist-packages/twisted/python/context.py", line 122, in callWithContext
            return self.currentContext().callWithContext(ctx, func, *args, **kw)
          File "/snap/maas/12555/usr/lib/python3/dist-packages/twisted/python/context.py", line 85, in callWithContext
            return func(*args,**kw)
          File "/snap/maas/12555/lib/python3.8/site-packages/provisioningserver/utils/twisted.py", line 192, in wrapper
            ...

Read more...

Revision history for this message
Alberto Donato (ack) wrote (last edit ):

Thanks for the debugging, we should be able to ignore those errors and continue with other cleanups.

As a workaround, can you manually remove that directory and check if it fixes the issue?

Changed in maas:
milestone: none → next
status: Incomplete → Triaged
importance: Undecided → High
importance: High → Medium
Alberto Donato (ack)
Changed in maas:
assignee: nobody → Alberto Donato (ack)
status: Triaged → In Progress
Changed in maas:
status: In Progress → Fix Committed
Alberto Donato (ack)
no longer affects: maas/3.1
no longer affects: maas/trunk
no longer affects: maas/3.1
Changed in maas:
status: Fix Committed → New
no longer affects: maas/3.1
Changed in maas:
milestone: next → none
no longer affects: maas/3.1
Changed in maas:
status: New → Fix Committed
milestone: none → next
Alberto Donato (ack)
Changed in maas:
milestone: next → 3.1.0-rc1
Alberto Donato (ack)
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.