LVMiSCSI driver can't issue direct I/O through tgtd to iscsi volume

Bug #1336568 reported by Mitsuhiro Tanino
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Undecided
Mitsuhiro Tanino
cinder (Ubuntu)
Fix Released
Undecided
Billy Olsen
Trusty
Fix Released
Undecided
Billy Olsen

Bug Description

I found a problem which LVMiSCSI driver can't issue direct I/O through tgtd to iscsi volume.

In current implementation, qemu-kvm opens device(storage volume) using cache='node'at nova side, however, tgtd opens device without "--bsoflags direct" at cinder side.

Therefore, I/O from guest instances are cached at control node even though the compute node issues O_DIRECT I/O to iSCSI volume.
As a result, if control node has a crash, cached data will be lost. This causes data lost problem of guest instance.

I will propose a fix of this issue.

Here are test environment and confirmation results.
(1) Control node
    - Control node has nova, cinder(c-sch, c-api, c-vol), glance, horizon
      services.
    - Use LVMiSCSI driver for cinder backend.
(2) Compute node has n-cpu and n-net services.

[Confirmation at compute node]
On a compute node, qemu opens device file using cache='none'. This means instance can issue direct I/O from guest to the device.

[root@compute ~]# cat /etc/libvirt/qemu/instance-0000000b.xml
<domain type='kvm'>
  <name>instance-0000000b</name>
  <uuid>9e1eb5cc-4c40-4023-bca5-a7d1720c6f51</uuid>
....
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      ######### Open the device without cache.
      <source dev='/dev/disk/by-path/ip-10.16.42.67:3260-iscsi-iqn.2010-10.org.openstack:volume-fdd23217-6e95-4aee-a586-6ef174567ba5-lun-1'/>
      <target dev='vdc' bus='virtio'/>
      <serial>fdd23217-6e95-4aee-a586-6ef174567ba5</serial>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>
....
</domain>

Confirm a file descriptor whether the device is opened with O_DIRECT or not at compute node.

=> qemu Process ID is "24836"
[root@compute ~]# ps uax | grep qemu
root 11421 0.0 0.0 112672 912 pts/6 S+ 17:13 0:00 grep --color=auto qemu
qemu 24836 13.8 16.0 4638484 1312668 ? Sl Jun29 462:12 /usr/bin/qemu-system-x86_64 -machine accel=kvm -name instance-0000000b .....

=> Device file of iscsi cinder volume is "/dev/sde".
[root@compute ~]# ls -la /dev/disk/by-path/
.....
-rw-r--r-- 1 root root 349525333 Jun 27 10:01 ip-10.16.42.67
lrwxrwxrwx 1 root root 9 Jun 30 00:16 ip-10.16.42.67:3260-iscsi-iqn.2010-10.org.openstack:volume-fdd23217-6e95-4aee-a586-6ef174567ba5-lun-1 -> ../../sde

=> "fd18" is infomation of /dev/sde
[root@compute ~]# ls -la /proc/24836/fd
total 0
dr-x------ 2 qemu qemu 0 Jun 29 09:40 .
dr-xr-xr-x 9 qemu qemu 0 Jun 29 09:31 ..
.....
lrwx------ 1 qemu qemu 64 Jun 29 09:40 18 -> /dev/sde

=> The flags is "02140002". O_DIRECT flag is "0x40000". This flag is raised at compute node side.
[root@compute ~]# cat /proc/24836/fdinfo/18
pos: 10737418240
flags: 02140002

[Confirmation at control node]
Confirm iscsi target status and exported disk.
Backing store path is /dev/stack-volumes/volume-fdd23217-6e95-4aee-a586-6ef174567ba5

[mtanino@control ~]$ sudo tgt-admin -s
.....
Target 3: iqn.2010-10.org.openstack:volume-fdd23217-6e95-4aee-a586-6ef174567ba5
    System information:
        Driver: iscsi
...
        LUN: 1
...
            Backing store path: /dev/stack-volumes/volume-fdd23217-6e95-4aee-a586-6ef174567ba5
            Backing store flags:
    Account information:
    ACL information:
        ALL

=>Condirm device mapper file of the backing store

[mtanino@control ~]$ ls -la /dev/disk/by-id/ | grep stack
lrwxrwxrwx 1 root root 10 Jun 30 00:16 dm-name-stack--volumes-volume--fdd23217--6e95--4aee--a586--6ef174567ba5 -> ../../dm-0

=> tgtd Process ID is "31010"
[mtanino@control ~]$ ps aux | grep tgtd
root 31010 2.0 0.0 476584 900 ? Ssl Jun30 52:27 /usr/sbin/tgtd -f

=> "fd11" is infomation of /dev/dm-0
[mtanino@control ~]$ sudo ls -la /proc/31010/fd
...
lrwx------ 1 root root 64 Jul 1 16:11 11 -> /dev/dm-0
lrwx------ 1 root root 64 Jul 1 16:11 12 -> /dev/sdb2
...

=> The flags is "0100002". O_DIRECT flag is "0x40000". This flag is not raised at control node side.
[mtanino@control ~]$ sudo cat /proc/31010/fdinfo/11
pos: 0
flags: 0100002

Regards,
Mitsuhiro Tanino

========================================================================
[Impact]

 * May see data loss without the ability to use write-through caching
   (write-cache off) option instead of write-back (write-cache on)
   option for iscsi targets.

[Test Case]

 * Configure Cinder to use LVMiSCSIDriver
 * Create cinder volume (cinder create --display-name foo 1G)
 * Attach volume to nova instance (nova volume-attach my-instance <vol-uuid>)

 * Observe the write-cache policy specified per cinder volume (found in)
   - /var/lib/cinder/volumes/volume-<uuid>

 * Observe above information (detailed by Mitsuhiro)

[Regression Potential]

 * Low risk of regression as the feature is enabled through a
   configurable option in which default value takes original behavior.

Related branches

description: updated
description: updated
description: updated
Changed in cinder:
assignee: nobody → Mitsuhiro Tanino (mitsuhiro-tanino)
description: updated
Revision history for this message
Mitsuhiro Tanino (mitsuhiro-tanino) wrote :

After issuing direct I/O from guest instance, dirty cache of control node increases from 0KB to 202436 KB.
This is same size of I/O from guest instance.

[Guest]
[root@driver-test fio]# dd if=/dev/zero of=/dev/vdd oflag=direct bs=1024k count=200
200+0 records in
200+0 records out

[Control node]
[root@control rpmbuild]# cat /proc/meminfo
MemTotal: 8164312 kB
MemFree: 971492 kB
MemAvailable: 3838688 kB
Buffers: 1509816 kB
Cached: 1413724 kB
SwapCached: 37264 kB
Active: 4148700 kB
Inactive: 2574248 kB
Active(anon): 2796988 kB
Inactive(anon): 1016204 kB
Active(file): 1351712 kB
Inactive(file): 1558044 kB
Unevictable: 26172 kB
Mlocked: 26172 kB
SwapTotal: 4079612 kB
SwapFree: 2563736 kB
Dirty: 202436 kB <**************** Cached I/O
Writeback: 0 kB
AnonPages: 3797040 kB
Mapped: 56800 kB
Shmem: 2240 kB
Slab: 269472 kB
SReclaimable: 210844 kB
SUnreclaim: 58628 kB
KernelStack: 5624 kB
PageTables: 68488 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 8161768 kB
Committed_AS: 10106272 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 300076 kB
VmallocChunk: 34359336928 kB
HardwareCorrupted: 0 kB
AnonHugePages: 237568 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 148068 kB
DirectMap2M: 8230912 kB

Revision history for this message
Mitsuhiro Tanino (mitsuhiro-tanino) wrote :

There are two parameters related to cache.

(1) "write-cache off"
In default setting the write cache is enabled(write-cache on). Therefore I/O is issued via write-back mode
and the I/O is cached on dirty cache.

If we use "write-cache off", Write I/O is issued via write-through mode and the write I/O is not cached.
Read I/O is still cached even where we user this parameter. So read I/O can keep good performance.

(2) "--bsoflags direct"
When we use this parameter, both Read and Write I/O are issued without dirty cache and buffer cache.
Using this parameter, we can suppress increasing buffer cache via tgtd's I/O.

I think these parameters should be used case by case basis.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/104714

Changed in cinder:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/104714
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=617e59bde660f919df0818611b911ae35f8b7247
Submitter: Jenkins
Branch: master

commit 617e59bde660f919df0818611b911ae35f8b7247
Author: Mitsuhiro Tanino <email address hidden>
Date: Tue Jul 8 15:52:11 2014 -0400

    Configure write cache option of tgtd iscsi driver

    Cinder LVMiSCSI driver is using default value of write-cache parameter
    of tgtd iscsi driver. In this setting, write I/O from guest instance is
    cached on dirty cache of a host.(write-back mode)

    In this case, data lost may be occurred if the host crashes before
    flushing dirty cache. This may cause a lot of instances to lose
    their data.

    In order to avoid this issue, it is better to turn off the write cache.
    (write-through mode)

    This patch adds "iscsi_write_cache" parameter to configure a behavior of
    write cache. The default value is "iscsi_write_cache=on".(write-back mode)

    Closes-Bug: 1336568
    DocImpact

    Change-Id: I7a495bc6118d4254576bdf1620a04ac537b3078d
    Signed-off-by: Mitsuhiro Tanino <email address hidden>

Changed in cinder:
status: In Progress → Fix Committed
Changed in cinder:
milestone: none → juno-2
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in cinder:
milestone: juno-2 → 2014.2
Changed in cinder (Ubuntu):
assignee: nobody → Billy Olsen (billy-olsen)
Revision history for this message
Billy Olsen (billy-olsen) wrote :

This patch was included in the juno release of cinder and thus is available already in utopic, vivid, etc. It is not available in icehouse/trusty - which this patch is for.

description: updated
tags: added: sts
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in cinder (Ubuntu):
status: New → Confirmed
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Mitsuhiro, or anyone else affected,

Accepted cinder into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cinder/1:2014.1.5-0ubuntu2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in cinder (Ubuntu Trusty):
status: New → Fix Committed
tags: added: verification-needed
Revision history for this message
Billy Olsen (billy-olsen) wrote :

Was able to verify the fix for this bug today. Installed cinder from the trusty-proposed pocket and ran the following tests to confirm:

# Test one, ensure default option remains to write-cache on
1. create volume
2. attach iscsi volume to instance
3. Verify generated xml in /var/lib/cinder/volumes/<volume-id> is generated with write-cache on.

# Change the iscsi-write-cache to off and restart cinder volumes
1. Set iscsi_write_cache = off in /etc/cinder/cinder.conf
2. Create lvm volume
3. Attach via iscsi to instance
4. Verify generated xml in /var/lib/cinder/volumes/<volume-id> is generated with write-cache off

tags: added: verification-done
removed: verification-needed
Revision history for this message
Chris J Arges (arges) wrote : Update Released

The verification of the Stable Release Update for cinder has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Changed in cinder (Ubuntu):
status: Confirmed → Fix Released
Changed in cinder (Ubuntu Trusty):
assignee: nobody → Billy Olsen (billy-olsen)
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cinder - 1:2014.1.5-0ubuntu2

---------------
cinder (1:2014.1.5-0ubuntu2) trusty; urgency=medium

  * Enable iscsi_write_cache option for tgtadm backends (LP: #1336568):
    - d/p/tgtadmin-iscsi-write-cache-config.patch - Includes backport of
      change from the juno release for enabling iscsi write cache policy
      for tgtadm.

 -- Billy Olsen <email address hidden> Mon, 27 Jul 2015 17:35:57 -0700

Changed in cinder (Ubuntu Trusty):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.