~tyhicks/ubuntu/+source/linux/+git/xenial:cves/CVE-2019-3900

Last commit made on 2019-08-05
Get this branch:
git clone -b cves/CVE-2019-3900 https://git.launchpad.net/~tyhicks/ubuntu/+source/linux/+git/xenial
Only Tyler Hicks can upload to this branch. If you are Tyler Hicks please log in for upload directions.

Branch merges

Branch information

Name:
cves/CVE-2019-3900
Repository:
lp:~tyhicks/ubuntu/+source/linux/+git/xenial

Recent commits

22f96a8... by Jason Wang

vhost: scsi: add weight support

This patch will check the weight and exit the loop if we exceeds the
weight. This is useful for preventing scsi kthread from hogging cpu
which is guest triggerable.

This addresses CVE-2019-3900.

Cc: Paolo Bonzini <email address hidden>
Cc: Stefan Hajnoczi <email address hidden>
Fixes: 057cbf49a1f0 ("tcm_vhost: Initial merge for vhost level target fabric driver")
Signed-off-by: Jason Wang <email address hidden>
Reviewed-by: Stefan Hajnoczi <email address hidden>
Signed-off-by: Michael S. Tsirkin <email address hidden>
Reviewed-by: Stefan Hajnoczi <email address hidden>

CVE-2019-3900

(backported from commit c1ea02f15ab5efb3e93fc3144d895410bf79fcf2)
[tyhicks: Backport to Xenial:
 - Minor context adjustment in local variables
 - Adjust context around the loop in vhost_scsi_handle_vq()
 - No need to modify vhost_scsi_ctl_handle_vq() since it was added later
   in commit 0d02dbd68c47 ("vhost/scsi: Respond to control queue
   operations")]
Signed-off-by: Tyler Hicks <email address hidden>

d622359... by Jason Wang

vhost_net: fix possible infinite loop

When the rx buffer is too small for a packet, we will discard the vq
descriptor and retry it for the next packet:

while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk,
           &busyloop_intr))) {
...
 /* On overrun, truncate and discard */
 if (unlikely(headcount > UIO_MAXIOV)) {
  iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1);
  err = sock->ops->recvmsg(sock, &msg,
      1, MSG_DONTWAIT | MSG_TRUNC);
  pr_debug("Discarded rx packet: len %zd\n", sock_len);
  continue;
 }
...
}

This makes it possible to trigger a infinite while..continue loop
through the co-opreation of two VMs like:

1) Malicious VM1 allocate 1 byte rx buffer and try to slow down the
   vhost process as much as possible e.g using indirect descriptors or
   other.
2) Malicious VM2 generate packets to VM1 as fast as possible

Fixing this by checking against weight at the end of RX and TX
loop. This also eliminate other similar cases when:

- userspace is consuming the packets in the meanwhile
- theoretical TOCTOU attack if guest moving avail index back and forth
  to hit the continue after vhost find guest just add new buffers

This addresses CVE-2019-3900.

Fixes: d8316f3991d20 ("vhost: fix total length when packets are too short")
Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server")
Signed-off-by: Jason Wang <email address hidden>
Reviewed-by: Stefan Hajnoczi <email address hidden>
Signed-off-by: Michael S. Tsirkin <email address hidden>

CVE-2019-3900

(backported from commit e2412c07f8f3040593dfb88207865a3cd58680c0)
[tyhicks: Backport to Xenial:
 - Adjust handle_tx() instead of handle_tx_{copy,zerocopy}() due to
   missing commit 0d20bdf34dc7 ("vhost_net: split out datacopy logic")
 - Minor context adjustments due to a lack of missing the iov_limit
   member of the vhost_dev struct which was added later in commit
   b46a0bf78ad7 ("vhost: fix OOB in get_rx_bufs()")
 - handle_rx() still uses peek_head_len() due to missing and unneeded commit
   030881372460 ("vhost_net: basic polling support")
 - Context adjustment in call to vhost_log_write() in hunk #3 of net.c due to
   missing and unneeded commit cc5e71075947 ("vhost: log dirty page correctly")
 - Context adjustment in hunk #4 due to using break instead of goto out
 - Context adjustment in hunk #5 due to missing and unneeded commit
   c67df11f6e48 ("vhost_net: try batch dequing from skb array")]
Signed-off-by: Tyler Hicks <email address hidden>

21617a6... by Jason Wang

vhost: introduce vhost_exceeds_weight()

We used to have vhost_exceeds_weight() for vhost-net to:

- prevent vhost kthread from hogging the cpu
- balance the time spent between TX and RX

This function could be useful for vsock and scsi as well. So move it
to vhost.c. Device must specify a weight which counts the number of
requests, or it can also specific a byte_weight which counts the
number of bytes that has been processed.

Signed-off-by: Jason Wang <email address hidden>
Reviewed-by: Stefan Hajnoczi <email address hidden>
Signed-off-by: Michael S. Tsirkin <email address hidden>

CVE-2019-3900

(backported from commit e82b9b0727ff6d665fff2d326162b460dded554d)
[tyhicks: Backport to Xenial:
 - Adjust handle_tx() instead of handle_tx_{copy,zerocopy}() due to
   missing commit 0d20bdf34dc7 ("vhost_net: split out datacopy logic")
 - Considerable context adjustments throughout the patch due to a lack
   of missing the iov_limit member of the vhost_dev struct which was
   added later in commit b46a0bf78ad7 ("vhost: fix OOB in get_rx_bufs()")
 - Context adjustment in call to vhost_log_write() in hunk #3 of net.c due to
   missing and unneeded commit cc5e71075947 ("vhost: log dirty page correctly")
 - Context adjustment in hunk #3 of net.c due to using break instead of goto
   out
 - Context adjustment in hunk #4 of net.c due to missing and unneeded commit
   c67df11f6e48 ("vhost_net: try batch dequing from skb array")
 - Don't patch vsock.c since Xenial doesn't have vhost vsock support
 - Adjust context in vhost_dev_init() to account for different local variables
 - Adjust context in struct vhost_dev to account for different struct members]
Signed-off-by: Tyler Hicks <email address hidden>

1f07b44... by Jason Wang

vhost_net: introduce vhost_exceeds_weight()

Signed-off-by: Jason Wang <email address hidden>
Signed-off-by: David S. Miller <email address hidden>

CVE-2019-3900

(backported from commit 272f35cba53d088085e5952fd81d7a133ab90789)
[tyhicks: Backport to Xenial:
 - Minor context adjustment in net.c due to missing commit b0d0ea50e782
   ("vhost_net: introduce helper to initialize tx iov iter")
 - Context adjustment in call to vhost_log_write() in hunk #4 due to missing
   and unneeded commit cc5e71075947 ("vhost: log dirty page correctly")
 - Context adjustment in hunk #4 due to using break instead of goto out]
Signed-off-by: Tyler Hicks <email address hidden>

06794da... by Paolo Abeni <email address hidden>

vhost_net: use packet weight for rx handler, too

Similar to commit a2ac99905f1e ("vhost-net: set packet weight of
tx polling to 2 * vq size"), we need a packet-based limit for
handler_rx, too - elsewhere, under rx flood with small packets,
tx can be delayed for a very long time, even without busypolling.

The pkt limit applied to handle_rx must be the same applied by
handle_tx, or we will get unfair scheduling between rx and tx.
Tying such limit to the queue length makes it less effective for
large queue length values and can introduce large process
scheduler latencies, so a constant valued is used - likewise
the existing bytes limit.

The selected limit has been validated with PVP[1] performance
test with different queue sizes:

queue size 256 512 1024

baseline 366 354 362
weight 128 715 723 670
weight 256 740 745 733
weight 512 600 460 583
weight 1024 423 427 418

A packet weight of 256 gives peek performances in under all the
tested scenarios.

No measurable regression in unidirectional performance tests has
been detected.

[1] https://developers.redhat.com/blog/2017/06/05/measuring-and-comparing-open-vswitch-performance/

Signed-off-by: Paolo Abeni <email address hidden>
Acked-by: Jason Wang <email address hidden>
Signed-off-by: David S. Miller <email address hidden>

CVE-2019-3900

(backported from commit db688c24eada63b1efe6d0d7d835e5c3bdd71fd3)
[tyhicks: Backport to Xenial:
 - Context adjustment in call to mutex_lock_nested() in hunk #3 due to missing
   and unneeded commit aaa3149bbee9 ("vhost_net: add missing lock nesting
   notation")
 - Context adjustment in call to vhost_log_write() in hunk #4 due to missing
   and unneeded commit cc5e71075947 ("vhost: log dirty page correctly")
 - Context adjustment in hunk #4 due to using break instead of goto out]
Signed-off-by: Tyler Hicks <email address hidden>

fad61d2... by haibinzhang

vhost-net: set packet weight of tx polling to 2 * vq size

handle_tx will delay rx for tens or even hundreds of milliseconds when tx busy
polling udp packets with small length(e.g. 1byte udp payload), because setting
VHOST_NET_WEIGHT takes into account only sent-bytes but no single packet length.

Ping-Latencies shown below were tested between two Virtual Machines using
netperf (UDP_STREAM, len=1), and then another machine pinged the client:

vq size=256
Packet-Weight Ping-Latencies(millisecond)
                   min avg max
Origin 3.319 18.489 57.303
64 1.643 2.021 2.552
128 1.825 2.600 3.224
256 1.997 2.710 4.295
512 1.860 3.171 4.631
1024 2.002 4.173 9.056
2048 2.257 5.650 9.688
4096 2.093 8.508 15.943

vq size=512
Packet-Weight Ping-Latencies(millisecond)
                   min avg max
Origin 6.537 29.177 66.245
64 2.798 3.614 4.403
128 2.861 3.820 4.775
256 3.008 4.018 4.807
512 3.254 4.523 5.824
1024 3.079 5.335 7.747
2048 3.944 8.201 12.762
4096 4.158 11.057 19.985

Seems pretty consistent, a small dip at 2 VQ sizes.
Ring size is a hint from device about a burst size it can tolerate. Based on
benchmarks, set the weight to 2 * vq size.

To evaluate this change, another tests were done using netperf(RR, TX) between
two machines with Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz, and vq size was
tweaked through qemu. Results shown below does not show obvious changes.

vq size=256 TCP_RR vq size=512 TCP_RR
size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize%
   1/ 1/ -7%/ -2% 1/ 1/ 0%/ -2%
   1/ 4/ +1%/ 0% 1/ 4/ +1%/ 0%
   1/ 8/ +1%/ -2% 1/ 8/ 0%/ +1%
  64/ 1/ -6%/ 0% 64/ 1/ +7%/ +3%
  64/ 4/ 0%/ +2% 64/ 4/ -1%/ +1%
  64/ 8/ 0%/ 0% 64/ 8/ -1%/ -2%
 256/ 1/ -3%/ -4% 256/ 1/ -4%/ -2%
 256/ 4/ +3%/ +4% 256/ 4/ +1%/ +2%
 256/ 8/ +2%/ 0% 256/ 8/ +1%/ -1%

vq size=256 UDP_RR vq size=512 UDP_RR
size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize%
   1/ 1/ -5%/ +1% 1/ 1/ -3%/ -2%
   1/ 4/ +4%/ +1% 1/ 4/ -2%/ +2%
   1/ 8/ -1%/ -1% 1/ 8/ -1%/ 0%
  64/ 1/ -2%/ -3% 64/ 1/ +1%/ +1%
  64/ 4/ -5%/ -1% 64/ 4/ +2%/ 0%
  64/ 8/ 0%/ -1% 64/ 8/ -2%/ +1%
 256/ 1/ +7%/ +1% 256/ 1/ -7%/ 0%
 256/ 4/ +1%/ +1% 256/ 4/ -3%/ -4%
 256/ 8/ +2%/ +2% 256/ 8/ +1%/ +1%

vq size=256 TCP_STREAM vq size=512 TCP_STREAM
size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize%
  64/ 1/ 0%/ -3% 64/ 1/ 0%/ 0%
  64/ 4/ +3%/ -1% 64/ 4/ -2%/ +4%
  64/ 8/ +9%/ -4% 64/ 8/ -1%/ +2%
 256/ 1/ +1%/ -4% 256/ 1/ +1%/ +1%
 256/ 4/ -1%/ -1% 256/ 4/ -3%/ 0%
 256/ 8/ +7%/ +5% 256/ 8/ -3%/ 0%
 512/ 1/ +1%/ 0% 512/ 1/ -1%/ -1%
 512/ 4/ +1%/ -1% 512/ 4/ 0%/ 0%
 512/ 8/ +7%/ -5% 512/ 8/ +6%/ -1%
1024/ 1/ 0%/ -1% 1024/ 1/ 0%/ +1%
1024/ 4/ +3%/ 0% 1024/ 4/ +1%/ 0%
1024/ 8/ +8%/ +5% 1024/ 8/ -1%/ 0%
2048/ 1/ +2%/ +2% 2048/ 1/ -1%/ 0%
2048/ 4/ +1%/ 0% 2048/ 4/ 0%/ -1%
2048/ 8/ -2%/ 0% 2048/ 8/ 5%/ -1%
4096/ 1/ -2%/ 0% 4096/ 1/ -2%/ 0%
4096/ 4/ +2%/ 0% 4096/ 4/ 0%/ 0%
4096/ 8/ +9%/ -2% 4096/ 8/ -5%/ -1%

Acked-by: Michael S. Tsirkin <email address hidden>
Signed-off-by: Haibin Zhang <email address hidden>
Signed-off-by: Yunfang Tai <email address hidden>
Signed-off-by: Lidong Chen <email address hidden>
Signed-off-by: David S. Miller <email address hidden>

CVE-2019-3900

(cherry picked from commit a2ac99905f1ea8b15997a6ec39af69aa28a3653b)
Signed-off-by: Tyler Hicks <email address hidden>

07b7d6d... by Willem de Bruijn <email address hidden>

vhost_net: do not stall on zerocopy depletion

Vhost-net has a hard limit on the number of zerocopy skbs in flight.
When reached, transmission stalls. Stalls cause latency, as well as
head-of-line blocking of other flows that do not use zerocopy.

Instead of stalling, revert to copy-based transmission.

Tested by sending two udp flows from guest to host, one with payload
of VHOST_GOODCOPY_LEN, the other too small for zerocopy (1B). The
large flow is redirected to a netem instance with 1MBps rate limit
and deep 1000 entry queue.

  modprobe ifb
  ip link set dev ifb0 up
  tc qdisc add dev ifb0 root netem limit 1000 rate 1MBit

  tc qdisc add dev tap0 ingress
  tc filter add dev tap0 parent ffff: protocol ip \
      u32 match ip dport 8000 0xffff \
      action mirred egress redirect dev ifb0

Before the delay, both flows process around 80K pps. With the delay,
before this patch, both process around 400. After this patch, the
large flow is still rate limited, while the small reverts to its
original rate. See also discussion in the first link, below.

Without rate limiting, {1, 10, 100}x TCP_STREAM tests continued to
send at 100% zerocopy.

The limit in vhost_exceeds_maxpend must be carefully chosen. With
vq->num >> 1, the flows remain correlated. This value happens to
correspond to VHOST_MAX_PENDING for vq->num == 256. Allow smaller
fractions and ensure correctness also for much smaller values of
vq->num, by testing the min() of both explicitly. See also the
discussion in the second link below.

Changes
  v1 -> v2
    - replaced min with typed min_t
    - avoid unnecessary whitespace change

Link:http://lkml.<email address hidden>
Link:http://<email address hidden>
Signed-off-by: Willem de Bruijn <email address hidden>
Signed-off-by: David S. Miller <email address hidden>

CVE-2019-3900

(cherry picked from commit 1e6f74536de08b5e50cf0e37e735911c2cef7c62)
Signed-off-by: Tyler Hicks <email address hidden>

59c0dc7... by Jason Wang

vhost_net: tx batching

This patch tries to utilize tuntap rx batching by peeking the tx
virtqueue during transmission, if there's more available buffers in
the virtqueue, set MSG_MORE flag for a hint for backend (e.g tuntap)
to batch the packets.

Reviewed-by: Stefan Hajnoczi <email address hidden>
Signed-off-by: Jason Wang <email address hidden>
Acked-by: Michael S. Tsirkin <email address hidden>
Signed-off-by: David S. Miller <email address hidden>

CVE-2019-3900

(backported from commit 0ed005ce02fa0a88e5e6b7b5f7ff452171881610)
[tyhicks: Minor context adjustment due to missing commit 030881372460
 ("vhost_net: basic polling support")]
Signed-off-by: Tyler Hicks <email address hidden>

d85596c... by Jason Wang

vhost: introduce vhost_vq_avail_empty()

This patch introduces a helper which will return true if we're sure
that the available ring is empty for a specific vq. When we're not
sure, e.g vq access failure, return false instead. This could be used
for busy polling code to exit the busy loop.

Signed-off-by: Jason Wang <email address hidden>
Signed-off-by: Michael S. Tsirkin <email address hidden>

CVE-2019-3900

(cherry picked from commit d4a60603fa0b42012decfa058dfa44cffde7a10c)
Signed-off-by: Tyler Hicks <email address hidden>

529f12a... by Greg Kroah-Hartman <email address hidden>

Linux 4.4.186

BugLink: https://bugs.launchpad.net/bugs/1838467

Signed-off-by: Connor Kuehl <email address hidden>
Signed-off-by: Khalid Elmously <email address hidden>