~vlad-a/ubuntu/+source/linux/+git/cosmic:test

Last commit made on 2019-03-01
Get this branch:
git clone -b test https://git.launchpad.net/~vlad-a/ubuntu/+source/linux/+git/cosmic
Only Vladimir Sokolovsky can upload to this branch. If you are Vladimir Sokolovsky please log in for upload directions.

Branch merges

Branch information

Recent commits

050ef2a... by Yuval Avnery <email address hidden>

net/mlx5e: Register/Unregister to event only on offload init/cleanup

Smartnic with ubuntu loading at the same time as the host resulted in
kernel panic.

ECPF should only register to host events after esw offloads mode is ready.
And should unregister when eswitch offloads mode is unloading.

Signed-off-by: Yuval Avnery <email address hidden>

7b5d134... by Yuval Avnery <email address hidden>

net/mlx5e: Add a lock on tir list

Refresh tirs is looping over a global list of tirs while netdevs are
adding and removing tirs from that list. That is why a lock is
required.

814a619... by Dan Jurgens

net/mlx5e: Implement get_klink_setting for VF reps

DPDK relies on this to get link parameters.

Signed-off-by: Daniel Jurgens <email address hidden>

aa0e02e... by Moshe Shemesh <email address hidden>

net/mlx5e: RX, Verify MPWQE stride size is in range

Add check of MPWQE stride size is within range supported by HW. In case
calculated MPWQE stride size exceed range, linear SKB can't be used and
we should use non linear MPWQE instead.

Fixes: 619a8f2a42f1 ("net/mlx5e: Use linear SKB in Striding RQ")
Signed-off-by: Moshe Shemesh <email address hidden>
Reviewed-by: Tariq Toukan <email address hidden>
Signed-off-by: Saeed Mahameed <email address hidden>

480eb9e... by Moshe Shemesh <email address hidden>

net/mlx5e: RX, verify received packet size in Linear Striding RQ

In case of striding RQ, we use MPWRQ (Multi Packet WQE RQ), which means
that WQE (RX descriptor) can be used for many packets and so the WQE is
much bigger than MTU. In virtualization setups where the port mtu can
be larger than the vf mtu, if received packet is bigger than MTU, it
won't be dropped by HW on too small receive WQE. If we use linear SKB in
striding RQ, since each stride has room for mtu size payload and skb
info, an oversized packet can lead to crash for crossing allocated page
boundary upon the call to build_skb. So driver needs to check packet
size and drop it.

Introduce new SW rx counter, rx_oversize_pkts_sw_drop, which counts the
number of packets dropped by the driver for being too large.

As a new field is added to the RQ struct, re-open the channels whenever
this field is being used in datapath (i.e., in the case of linear
Striding RQ).

Fixes: 619a8f2a42f1 ("net/mlx5e: Use linear SKB in Striding RQ")
Signed-off-by: Moshe Shemesh <email address hidden>
Reviewed-by: Tariq Toukan <email address hidden>
Signed-off-by: Saeed Mahameed <email address hidden>

92a635e... by Sean Tranchetti <email address hidden>

net: udp: fix handling of CHECKSUM_COMPLETE packets

Current handling of CHECKSUM_COMPLETE packets by the UDP stack is
incorrect for any packet that has an incorrect checksum value.

udp4/6_csum_init() will both make a call to
__skb_checksum_validate_complete() to initialize/validate the csum
field when receiving a CHECKSUM_COMPLETE packet. When this packet
fails validation, skb->csum will be overwritten with the pseudoheader
checksum so the packet can be fully validated by software, but the
skb->ip_summed value will be left as CHECKSUM_COMPLETE so that way
the stack can later warn the user about their hardware spewing bad
checksums. Unfortunately, leaving the SKB in this state can cause
problems later on in the checksum calculation.

Since the the packet is still marked as CHECKSUM_COMPLETE,
udp_csum_pull_header() will SUBTRACT the checksum of the UDP header
from skb->csum instead of adding it, leaving us with a garbage value
in that field. Once we try to copy the packet to userspace in the
udp4/6_recvmsg(), we'll make a call to skb_copy_and_csum_datagram_msg()
to checksum the packet data and add it in the garbage skb->csum value
to perform our final validation check.

Since the value we're validating is not the proper checksum, it's possible
that the folded value could come out to 0, causing us not to drop the
packet. Instead, we believe that the packet was checksummed incorrectly
by hardware since skb->ip_summed is still CHECKSUM_COMPLETE, and we attempt
to warn the user with netdev_rx_csum_fault(skb->dev);

Unfortunately, since this is the UDP path, skb->dev has been overwritten
by skb->dev_scratch and is no longer a valid pointer, so we end up
reading invalid memory.

This patch addresses this problem in two ways:
 1) Do not use the dev pointer when calling netdev_rx_csum_fault()
    from skb_copy_and_csum_datagram_msg(). Since this gets called
    from the UDP path where skb->dev has been overwritten, we have
    no way of knowing if the pointer is still valid. Also for the
    sake of consistency with the other uses of
    netdev_rx_csum_fault(), don't attempt to call it if the
    packet was checksummed by software.

 2) Add better CHECKSUM_COMPLETE handling to udp4/6_csum_init().
    If we receive a packet that's CHECKSUM_COMPLETE that fails
    verification (i.e. skb->csum_valid == 0), check who performed
    the calculation. It's possible that the checksum was done in
    software by the network stack earlier (such as Netfilter's
    CONNTRACK module), and if that says the checksum is bad,
    we can drop the packet immediately instead of waiting until
    we try and copy it to userspace. Otherwise, we need to
    mark the SKB as CHECKSUM_NONE, since the skb->csum field
    no longer contains the full packet checksum after the
    call to __skb_checksum_validate_complete().

Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
Fixes: c84d949057ca ("udp: copy skb->truesize in the first cache line")
Cc: Sam Kumar <email address hidden>
Cc: Eric Dumazet <email address hidden>
Signed-off-by: Sean Tranchetti <email address hidden>
Signed-off-by: David S. Miller <email address hidden>

8d41b4b... by Dan Jurgens

net/mlx5: Use correct number of vports when creating peer miss rules

Total vports include the uplink vport. So an extra peer miss rule for a
non-existant vport was being created and causing steering problems.
Reduce the number of peer miss rules to the number of non-uplink vports.

Signed-off-by: Daniel Jurgens <email address hidden>

68b927a... by Eli Britstein <email address hidden>

net/mlx5e: Fix cb_ident duplicate in indirect block register

Previously the identifier used for indirect block callback registry
and for block rule cb registry (when done via indirect blocks) was the
pointer to the tunnel netdev we were interested in receiving updates on.
This worked fine if a single PF existed that registered one callback for
the tunnel netdev of interest. However, if multiple PFs are in place then
the 2nd PF tries to register with the same tunnel netdev identifier. This
leads to EEXIST errors and/or incorrect cb deletions.

Prevent this conflict by using the rpriv pointer as the identifier for
netdev indirect block cb registry, allowing each PF to register a unique
callback per tunnel netdev. For block cb registry, the same PF may
register multiple cbs to the same block if using TC shared blocks.
Instead of the rpriv, use the pointer to the allocated indr_priv data as
the identifier here. This means that there can be a unique block callback
for each PF/tunnel netdev combo.

Fixes: f5bc2c5de101 ("net/mlx5e: Support TC indirect block notifications
for eswitch uplink reprs")
Signed-off-by: Eli Britstein <email address hidden>
Reviewed-by: Oz Shlomo <email address hidden>
Signed-off-by: Saeed Mahameed <email address hidden>

5071f04... by Dan Jurgens

net/mlx5: Fix LAG SRIOV mode check for BF

On bluefield the sriov_enabled returns false, because there are no VFs,
however SRIOV is enabled because we use switchdev mode for the host PFs.
Expand the checks to catch this and avoid enabling RoCE LAG on BF in
switchdev mode.

Signed-off-by: Daniel Jurgens <email address hidden>

6c8711c... by Or Gerlitz <email address hidden>

net/mlx5e: Move to use common phys port names for vport representors

With VF LAG commit 491c37e49b48 "net/mlx5e: In case of LAG, one switch
parent id is used for all representors", both uplinks and all the VFs
(on both of them) get the same switchdev id.

This cause the provisioning system method to identify the rep of a given
VF from the parent PF PCI device using switchev id and physical port
name to break, since VFm of PF0 will have the (id, name) as VFm of PF1.

To fix that, we align to use the framework agreed upstream and set by
nfp commit 168c478e107e "nfp: wire get_phys_port_name on representors":

$ cat /sys/class/net/eth4_*/phys_port_name
p0
pf0vf0
pf0vf1

Now, the names will be different, e.g. pf0vf0 vs. pf1vf0.

Fixes: 491c37e49b48 ("net/mlx5e: In case of LAG, one switch parent id is used for all representors")
Signed-off-by: Or Gerlitz <email address hidden>
Reported-by: Waleed Musa <email address hidden>
Reviewed-by: Roi Dayan <email address hidden>
Signed-off-by: Saeed Mahameed <email address hidden>