ceph-volume lvm list <device> calls blkid numerous times for differrent devices

Bug #1908375 reported by Dariusz Gadomski
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Fix Released
Undecided
Unassigned
Queens
Fix Released
Undecided
Unassigned
ceph (Ubuntu)
Fix Released
Medium
Unassigned
Bionic
Fix Released
Medium
Dariusz Gadomski
Focal
Fix Released
Medium
Unassigned
Groovy
Fix Released
Medium
Unassigned

Bug Description

[Impact]

 * Every ceph-volume list lvm <device> call invokes blkid for numerous PARTUUIDs. For some setups with many slower IO devices this can make this call to run for minutes without any actual justification for that.
In fact, the upstream ceph approach changed in this matter and post-bionic releases already have ceph-volume that does not invoke blkid at all in this context making the call much faster.

Please examine the attached ceph-volume.log fragment for a ceph-volume call, the accumulated blkid calls take around 1 min 7 s.

[Test Case]

 * Setup a ceph-osd with numerous block devices with long access time for blkid. Preferably use automation tools like juju (with ceph charm) or ansible to make sure ceph-volume calls work well when automated.
 * Run
time ceph-volume --log-path ceph-volume.log --log-level debug lvm list <device>
on one of them and check the log to see if the execution time is not wasted on numerous blkid calls.

[Where problems could occur]

 * The output format of does not change. The only noticeable change will be visible in the (created on request with the --log-level --log-path switched) log file. Such log will also be missing the blkid calls. I don't see a reason for parsing that log file, but if any system (like CI) is doing that it may need a manual intervention to update its expected result.

[Other Info]

 * The fix to this is available for Focal and beyond.
 * Xenial is not affected due to lack of ceph-volume in its ceph release.

Related branches

Revision history for this message
Dariusz Gadomski (dgadomski) wrote :
Changed in ceph (Ubuntu):
status: New → Fix Released
Changed in ceph (Ubuntu Bionic):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Dariusz Gadomski (dgadomski)
Changed in ceph (Ubuntu):
assignee: Dariusz Gadomski (dgadomski) → nobody
Dan Hill (hillpd)
Changed in ceph (Ubuntu Groovy):
status: New → Fix Released
Changed in ceph (Ubuntu Focal):
status: New → Fix Released
Changed in ceph (Ubuntu):
importance: Undecided → Medium
Changed in ceph (Ubuntu Groovy):
importance: Undecided → Medium
Changed in ceph (Ubuntu Focal):
importance: Undecided → Medium
Revision history for this message
Dariusz Gadomski (dgadomski) wrote :
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

The code in this change looks OK, but it's also a lot of change to propose backporting to a stable release. Unfortunately, Luminous is EoL upstream or we could push to land it upstream and bring it in via an upstream point release which would be a bit safer for us.

Because this is fairly large of a change, I'd like to see a more comprehensive test case for the SRU to ensure that we aren't regressing other things by removing the blkid calls. As called out in the regression potential, things that read the output of ceph-volume may be impacted, so I'd like to see that covered as well in the test case.

Revision history for this message
Dariusz Gadomski (dgadomski) wrote :

I have performed a basic set of sanity testing on the patched ceph-volume - no issues noticed nor difference in the output format.

I have also run ceph-volume tests with tox (logs attached: ceph-volume-vanilla.tox.log - the unpatched version, ceph-volume-patched.tox.log - patched version). In both cases status is identical:

grep -w passed *.tox.log
ceph-volume-patched.tox.log:========================= 1829 passed in 3.91 seconds ==========================
ceph-volume-patched.tox.log:======================= 1828 passed, 1 skipped in 2.99s ========================
ceph-volume-vanilla.tox.log:========================= 1829 passed in 4.04 seconds ==========================
ceph-volume-vanilla.tox.log:======================= 1828 passed, 1 skipped in 2.97s ========================

Revision history for this message
Dariusz Gadomski (dgadomski) wrote :

tox log for patched ceph-volume

Revision history for this message
Dariusz Gadomski (dgadomski) wrote :

I have also made an attempt to run tasks.ceph_deploy test suite with vstart as this seems to be the only one that makes use of `ceph-volume`, but I have failed due to Python2/Python3 syntax issues.

I have set up venv with Python2 (since qa/tasks/vstart_runner.py is not Python3 compatible) with teuthology (pip install git+https://github.com/ceph/teuthology@luminous#egg=teuthology[test]).

Test run result:
Traceback (most recent call last):
  File "../qa/tasks/vstart_runner.py", line 1086, in <module>
    exec_test()
  File "../qa/tasks/vstart_runner.py", line 893, in exec_test
    args=["ps", "-u"+str(os.getuid())]
  File "../qa/tasks/vstart_runner.py", line 296, in run
    proc.wait()
  File "../qa/tasks/vstart_runner.py", line 164, in wait
    self.stdout.write(out)
TypeError: unicode argument expected, got 'str'

I'm not sure how to successfully run the suite, but I'll keep trying.

Revision history for this message
Dariusz Gadomski (dgadomski) wrote :

I have successfully tested the patched version with a 3 ceph-osd nodes setup, each with 10 or 20 OSDs. This setup has been deployed with juju charms.

No problems were observed nor differences compared to a vanilla version.

Attaching ceph-volume.logs from an example node with 10 and 20 volumes.

Revision history for this message
Dariusz Gadomski (dgadomski) wrote :

ceph-volume.log from a node with 20 volumes.

description: updated
description: updated
Revision history for this message
Edward Hope-Morley (hopem) wrote :

upload to bionic unapproved queue on 11th May - https://launchpadlibrarian.net/538166201/ceph_12.2.13-0ubuntu0.18.04.8_source.changes and still awaiting sru team approval.

Revision history for this message
Łukasz Zemczak (sil2100) wrote : Proposed package upload rejected

An upload of ceph to bionic-proposed has been rejected from the upload queue for the following reason: "The upload includes some garbage files, e.g. */series.orig leftovers - please clean those up and re-upload. Also, before we go on with accepting the upload, I wanted to know if Chris's concerns regarding the test case have been addressed - is it now sufficient enough? Since I don't have the required expertise, I'd like someone from the OS team to give a +1 on that before we proceed as the patchset is indeed quite big.".

Revision history for this message
Dariusz Gadomski (dgadomski) wrote :

I have reuploaded the patch removing unnecessary entries (debdiff attached for reference).

Revision history for this message
James Page (james-page) wrote :

Thanks Dariusz - not quite sure where the garbage came from as the ubuntu/bionic branch in the git repo does not contain it.

This may have been my error when I uploaded the first one.

description: updated
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

With the latest details and test-plan, I'm more comfortable with this change, thanks!

Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Looking much more crisp, thank you!

Changed in ceph (Ubuntu Bionic):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-bionic
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello Dariusz, or anyone else affected,

Accepted ceph into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ceph/12.2.13-0ubuntu0.18.04.8 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Dariusz Gadomski (dgadomski) wrote :

I have verified the package in 2 ways:
* manually (no traces of blkid calls in the log)
* automated - by using juju charms to set up a ceph deployment

In both cases there were no issues, no regressions noticed.

Verification is successful.

tags: added: verification-done verification-done-bionic
removed: verification-needed verification-needed-bionic
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

I assume you were testing the -proposed packages! I trust your verification, but for paperwork reasons please always include the package version number in the comment.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ceph - 12.2.13-0ubuntu0.18.04.8

---------------
ceph (12.2.13-0ubuntu0.18.04.8) bionic; urgency=medium

  * d/p/lp1908375*.patch: remove blkid calls from ceph-volume lvm list
    to improve performance/experience in systems with large numbers of
    slow disks (LP: #1908375).

 -- Dariusz Gadomski <email address hidden> Mon, 07 Jun 2021 16:39:26 +0200

Changed in ceph (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for ceph has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Dariusz Gadomski (dgadomski) wrote :

Sorry for not stating that explicitly - yes, I have tested the -proposed version (
12.2.13-0ubuntu0.18.04.8).

Revision history for this message
James Page (james-page) wrote : Please test proposed package

Hello Dariusz, or anyone else affected,

Accepted ceph into queens-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:queens-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-queens-needed to verification-queens-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-queens-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-queens-needed
Revision history for this message
James Page (james-page) wrote :

======
Totals
======
Ran: 99 tests in 1202.9744 sec.
 - Passed: 89
 - Skipped: 9
 - Expected Fail: 0
 - Unexpected Success: 0
 - Failed: 1
Sum of execute time for each test: 486.3905 sec.

Failed test is expected.

Revision history for this message
James Page (james-page) wrote : Update Released

The verification of the Stable Release Update for ceph has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
James Page (james-page) wrote :

This bug was fixed in the package ceph - 12.2.13-0ubuntu0.18.04.8~cloud0
---------------

 ceph (12.2.13-0ubuntu0.18.04.8~cloud0) xenial-queens; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 ceph (12.2.13-0ubuntu0.18.04.8) bionic; urgency=medium
 .
   * d/p/lp1908375*.patch: remove blkid calls from ceph-volume lvm list
     to improve performance/experience in systems with large numbers of
     slow disks (LP: #1908375).

Mathew Hodson (mhodson)
Changed in cloud-archive:
status: New → Fix Released
tags: removed: verification-queens-needed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.