Data corruption in qemu_rbd_co_block_status

Bug #1968258 reported by Markus Schade
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
qemu (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Hi Christian,

we have started testing jammy and triggered a qemu bug which has already been fixed upstream.

The qemu 6.2 in jammy has currently a buggy implementation of .bdrv_co_block_status in block/rbd.
This can result in data corruption and/or crash of the instance.

https://tracker.ceph.com/issues/53784

Please consider backporting the following patches into qemu jammy before the release:

https://git.qemu.org/?p=qemu.git;a=patch;h=9e302f64bb407a9bb097b626da97228c2654cfee
https://git.qemu.org/?p=qemu.git;a=patch;h=fc176116cdea816ceb8dd969080b2b95f58edbc0

A repoducer is in the Ceph tracker. Trying to create a qcow2 snapshot from a running rbd-backed virtual machine will lead to a crash of the virtual machine, e.g.

# virsh snapshot-create-as --domain vm-123 --no-metadata --disk-only --diskspec sda,file=/var/lib/libvirt/qemu/snapshot/disk-123.qcow2

Resulting core dump:

#0 0x00007fbaee61f18b in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007fbaee5fe859 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007fbaee5fe729 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x00007fbaee60ff36 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x00007fbaed64663f in qemu_rbd_co_block_status (bs=<optimized out>, want_zero=<optimized out>, offset=4259840, bytes=12288, pnum=0x7fba27bfb980, map=<optimized out>, file=0x7fba27bfb6c8) at ../../block/rbd.c:1355
#5 0x00005566ab8f574c in bdrv_co_block_status (bs=0x5566ac468220, want_zero=want_zero@entry=false, offset=4259840, bytes=12288, pnum=pnum@entry=0x7fba27bfb980, map=map@entry=0x7fba27bfb770, file=0x7fba27bfb778) at ../../block/io.c:2489
#6 0x00005566ab8f582b in bdrv_co_block_status (bs=bs@entry=0x5566ac62ffd0, want_zero=want_zero@entry=false, offset=offset@entry=4259840, bytes=bytes@entry=12288, pnum=pnum@entry=0x7fba27bfb980, map=map@entry=0x0, file=0x0) at ../../block/io.c:2557
#7 0x00005566ab8f8589 in bdrv_co_common_block_status_above (bs=bs@entry=0x5566acdf0400, base=base@entry=0x0, include_base=include_base@entry=false, want_zero=want_zero@entry=false, offset=offset@entry=4259840, bytes=bytes@entry=12288, pnum=0x7fba27bfb980, map=0x0, file=0x0, depth=0x7fba27bfb824) at ../../block/io.c:2667
#8 0x00005566ab8c975a in bdrv_common_block_status_above (bs=0x5566acdf0400, base=base@entry=0x0, include_base=include_base@entry=false, want_zero=want_zero@entry=false, offset=4259840, bytes=bytes@entry=12288, pnum=0x7fba27bfb980, map=0x0, file=0x0, depth=0x0) at block/block-gen.c:444
#9 0x00005566ab8f8920 in bdrv_co_is_zero_fast (bs=bs@entry=0x5566acdf0400, offset=<optimized out>, bytes=12288) at ../../block/io.c:2755
#10 0x00005566ab91c924 in is_zero_cow (m=0x5566ac938660, bs=0x5566acdf0400) at ../../block/qcow2.c:2477
#11 handle_alloc_space (l2meta=<optimized out>, bs=0x5566acdf0400) at ../../block/qcow2.c:2477
#12 qcow2_co_pwritev_task (l2meta=<optimized out>, qiov_offset=<optimized out>, qiov=0x5566ad374160, bytes=<optimized out>, offset=<optimized out>, host_offset=<optimized out>, bs=0x5566acdf0400) at ../../block/qcow2.c:2550
#13 qcow2_co_pwritev_task_entry (task=<optimized out>) at ../../block/qcow2.c:2594
#14 0x00005566ab919866 in qcow2_add_task (bs=bs@entry=0x5566acdf0400, pool=pool@entry=0x0, func=func@entry=0x5566ab91c640 <qcow2_co_pwritev_task_entry>, subcluster_type=subcluster_type@entry=QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN, host_offset=471040, offset=offset@entry=4272128, bytes=4096, qiov=0x5566ad374160, qiov_offset=0, l2meta=0x5566ac938660) at ../../block/qcow2.c:2249
#15 0x00005566ab919fe7 in qcow2_co_pwritev_part (bs=0x5566acdf0400, offset=4272128, bytes=4096, qiov=0x5566ad374160, qiov_offset=0, flags=<optimized out>) at ../../block/qcow2.c:2645
#16 0x00005566ab8f9099 in bdrv_driver_pwritev (bs=bs@entry=0x5566acdf0400, offset=offset@entry=4272128, bytes=bytes@entry=4096, qiov=qiov@entry=0x5566ad374160, qiov_offset=qiov_offset@entry=0, flags=flags@entry=0) at ../../block/io.c:1252
#17 0x00005566ab8fb15f in bdrv_aligned_pwritev (child=0x5566ac6271e0, req=0x7fba27bfbe00, offset=4272128, bytes=4096, align=<optimized out>, qiov=0x5566ad374160, qiov_offset=0, flags=0) at ../../block/io.c:2126
#18 0x00005566ab8fbba8 in bdrv_co_pwritev_part (child=0x5566ac6271e0, offset=<optimized out>, offset@entry=4272128, bytes=<optimized out>, bytes@entry=4096, qiov=<optimized out>, qiov@entry=0x5566ad374160, qiov_offset=<optimized out>, qiov_offset@entry=0, flags=flags@entry=0) at ../../block/io.c:2314
#19 0x00005566ab8ec21d in blk_co_do_pwritev_part (blk=0x5566ad94e410, offset=4272128, bytes=4096, qiov=0x5566ad374160, qiov_offset=qiov_offset@entry=0, flags=0) at ../../block/block-backend.c:1283
#20 0x00005566ab8ec38f in blk_aio_write_entry (opaque=0x5566adfeefc0) at ../../block/block-backend.c:1467
#21 0x00005566ab9ddaa3 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at ../../util/coroutine-ucontext.c:173
#22 0x00007fbaee637660 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#23 0x00007ffc5fa8c270 in ?? ()
#24 0x0000000000000000 in ?? ()

Related branches

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in qemu (Ubuntu):
status: New → Confirmed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Markus,
your bug reports are always great, well prepared, reliable and useful!
Thank you so much already.

Both changes LGTM, are easily applied and should cause no side effects AFAICS.
I've added it to another case that i queued for review and upload already:

PPA:
https://launchpad.net/~paelzer/+archive/ubuntu/lp-1246924-enable-glusterfs/
MR:
https://code.launchpad.net/~paelzer/ubuntu/+source/qemu/+git/qemu/+merge/418926

Changed in qemu (Ubuntu):
status: Confirmed → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:6.2+dfsg-2ubuntu6

---------------
qemu (1:6.2+dfsg-2ubuntu6) jammy; urgency=medium

  * debian/control[-in]: no more disable glusterfs in Ubuntu (LP: #1246924)
  * Fix diff handling on ceph that can cause data corruption (LP: #1968258)
      - d/p/u/lp-1968258-block-rbd-fix-handling-of-holes-in-.bdrv_co.patch
      - d/p/u/lp-1968258-block-rbd-workaround-for-ceph-issue-53784.patch

 -- Christian Ehrhardt <email address hidden> Fri, 08 Apr 2022 09:36:34 +0200

Changed in qemu (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Markus Schade (lp-markusschade) wrote :

Hi Christian,

thanks you very much for getting this out of the door so fast!
I can confirm that new version with the patches resolves the issues.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

You are welcome,
trying to resolve things fast is the best I can give you in return for your always fast and large scale testing ending in well written bugs :-)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.