Currently an skb requiring TSO may not fit within a minimum-size TX
queue. The TX queue selected for the skb may stall and trigger the TX
watchdog repeatedly (since the problem skb will be retried after the
TX reset). This issue is designated as CVE-2012-3412.
Set the maximum number of TSO segments for our devices to 100. This
should make no difference to behaviour unless the actual MSS is less
than about 700. Increase the minimum TX queue size accordingly to
allow for 2 worst-case skbs, so that there will definitely be space
to add an skb after we wake a queue.
To avoid invalidating existing configurations, change
efx_ethtool_set_ringparam() to fix up values that are too small rather
than returning -EINVAL.
Signed-off-by: Ben Hutchings <email address hidden>
Signed-off-by: David S. Miller <email address hidden>
(back ported from commit 7e6d06f0de3f74ca929441add094518ae332257c)
Signed-off-by: Tim Gardner <email address hidden>
Acked-by: Herton Krzesinski <email address hidden>
Signed-off-by: Andy Whitcroft <email address hidden>
The 'page size' for PCIe DMA, i.e. the alignment of boundaries at
which DMA must be broken, is 4KB. Name this value as EFX_PAGE_SIZE
and use it in efx_max_tx_len(). Redefine EFX_BUF_SIZE as
EFX_PAGE_SIZE since its value is also a result of that requirement,
and use it in efx_init_special_buffer().
Signed-off-by: Ben Hutchings <email address hidden>
(back ported from commit 5b6262d0ccf759a16fabe11d904a2531125a4b71)
Cache the device gso_max_segs in sock::sk_gso_max_segs and use it to
limit the size of TSO skbs. This avoids the need to fall back to
software GSO for local TCP senders.
Signed-off-by: Ben Hutchings <email address hidden>
Signed-off-by: David S. Miller <email address hidden>
(back ported from commit 1485348d2424e1131ea42efc033cbd9366462b01)
Acked-by: Herton Krzesinski <email address hidden>
Signed-off-by: Tim Gardner <email address hidden>
Signed-off-by: Andy Whitcroft <email address hidden>
7accfd1...
by
Neal Cardwell <email address hidden>
tcp: do not scale TSO segment size with reordering degree
Since 2005 (c1b4a7e69576d65efc31a8cea0714173c2841244)
tcp_tso_should_defer has been using tcp_max_burst() as a target limit
for deciding how large to make outgoing TSO packets when not using
sysctl_tcp_tso_win_divisor. But since 2008
(dd9e0dda66ba38a2ddd1405ac279894260dc5c36) tcp_max_burst() returns the
reordering degree. We should not have tcp_tso_should_defer attempt to
build larger segments just because there is more reordering. This
commit splits the notion of deferral size used in TSO from the notion
of burst size used in cwnd moderation, and returns the TSO deferral
limit to its original value.
Signed-off-by: Neal Cardwell <email address hidden>
Signed-off-by: David S. Miller <email address hidden>
(cherry picked from commit 6b5a5c0dbb11dcff4e1b0f1ef87a723197948ed4)
Acked-by: Herton Krzesinski <email address hidden>
Signed-off-by: Tim Gardner <email address hidden>
Signed-off-by: Andy Whitcroft <email address hidden>
A peer (or local user) may cause TCP to use a nominal MSS of as little
as 88 (actual MSS of 76 with timestamps). Given that we have a
sufficiently prodigious local sender and the peer ACKs quickly enough,
it is nevertheless possible to grow the window for such a connection
to the point that we will try to send just under 64K at once. This
results in a single skb that expands to 861 segments.
In some drivers with TSO support, such an skb will require hundreds of
DMA descriptors; a substantial fraction of a TX ring or even more than
a full ring. The TX queue selected for the skb may stall and trigger
the TX watchdog repeatedly (since the problem skb will be retried
after the TX reset). This particularly affects sfc, for which the
issue is designated as CVE-2012-3412.
Therefore:
1. Add the field net_device::gso_max_segs holding the device-specific
limit.
2. In netif_skb_features(), if the number of segments is too high then
mask out GSO features to force fall back to software GSO.
Signed-off-by: Ben Hutchings <email address hidden>
Signed-off-by: David S. Miller <email address hidden>
(cherry picked from commit 30b678d844af3305cda5953467005cebb5d7b687)
Acked-by: Herton Krzesinski <email address hidden>
Signed-off-by: Tim Gardner <email address hidden>
Signed-off-by: Andy Whitcroft <email address hidden>
a9a23ae...
by
Alex Williamson <email address hidden>
KVM: unmap pages from the iommu when slots are removed
We've been adding new mappings, but not destroying old mappings.
This can lead to a page leak as pages are pinned using
get_user_pages, but only unpinned with put_page if they still
exist in the memslots list on vm shutdown. A memslot that is
destroyed while an iommu domain is enabled for the guest will
therefore result in an elevated page reference count that is
never cleared.
Additionally, without this fix, the iommu is only programmed
with the first translation for a gpa. This can result in
peer-to-peer errors if a mapping is destroyed and replaced by a
new mapping at the same gpa as the iommu will still be pointing
to the original, pinned memory address.
Signed-off-by: Alex Williamson <email address hidden>
Signed-off-by: Marcelo Tosatti <email address hidden>
(cherry picked from commit 32f6daad4651a748a58a3ab6da0611862175722f)
Conflicts:
virt/kvm/iommu.c
Signed-off-by: Steve Conklin <email address hidden>
Acked-by: Stefan Bader <email address hidden>
Signed-off-by: Tim Gardner <email address hidden>
Signed-off-by: Andy Whitcroft <email address hidden>
Jay Fenlason (<email address hidden>) found a bug,
that recvfrom() on an RDS socket can return the contents of random kernel
memory to userspace if it was called with a address length larger than
sizeof(struct sockaddr_in).
rds_recvmsg() also fails to set the addr_len paramater properly before
returning, but that's just a bug.
There are also a number of cases wher recvfrom() can return an entirely bogus
address. Anything in rds_recvmsg() that returns a non-negative value but does
not go through the "sin = (struct sockaddr_in *)msg->msg_name;" code path
at the end of the while(1) loop will return up to 128 bytes of kernel memory
to userspace.
And I write two test programs to reproduce this bug, you will see that in
rds_server, fromAddr will be overwritten and the following sock_fd will be
destroyed.
Yes, it is the programmer's fault to set msg_namelen incorrectly, but it is
better to make the kernel copy the real length of address to user space in
such case.
How to run the test programs ?
I test them on 32bit x86 system, 3.5.0-rc7.
4 you will see something like:
server is waiting to receive data...
old socket fd=3
server received data from client:data from client
msg.msg_namelen=32
new socket fd=-1067277685
sendmsg()
: Bad file descriptor
printf("server is waiting to receive data...\n");
msg.msg_name = &fromAddr;
/*
* I add 16 to sizeof(fromAddr), ie 32,
* and pay attention to the definition of fromAddr,
* recvmsg() will overwrite sock_fd,
* since kernel will copy 32 bytes to userspace.
*
* If you just use sizeof(fromAddr), it works fine.
* */
msg.msg_namelen = sizeof(fromAddr) + 16;
/* msg.msg_namelen = sizeof(fromAddr); */
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_iov->iov_base = recvBuffer;
msg.msg_iov->iov_len = 128;
msg.msg_control = 0;
msg.msg_controllen = 0;
msg.msg_flags = 0;
while (1) {
printf("old socket fd=%d\n", sock_fd);
if (recvmsg(sock_fd, &msg, 0) == -1) {
perror("recvmsg() error\n");
close(sock_fd);
exit(1);
}
printf("server received data from client:%s\n", recvBuffer);
printf("msg.msg_namelen=%d\n", msg.msg_namelen);
printf("new socket fd=%d\n", sock_fd);
strcat(recvBuffer, "--data from server");
if (sendmsg(sock_fd, &msg, 0) == -1) {
perror("sendmsg()\n");
close(sock_fd);
exit(1);
}
}
close(sock_fd);
return 0;
}
Signed-off-by: Weiping Pan <email address hidden>
Signed-off-by: David S. Miller <email address hidden>
(cherry picked from commit 06b6a1cf6e776426766298d055bb3991957d90a7)
Signed-off-by: Brad Figg <email address hidden>
Signed-off-by: Tim Gardner <email address hidden>