Merge ~bodong-wang/ubuntu/+source/linux-bluefield:nvidia-devlink into ~canonical-kernel/ubuntu/+source/linux-bluefield/+git/focal:master-next

Proposed by Bodong Wang
Status: Needs review
Proposed branch: ~bodong-wang/ubuntu/+source/linux-bluefield:nvidia-devlink
Merge into: ~canonical-kernel/ubuntu/+source/linux-bluefield/+git/focal:master-next
Diff against target: 10740 lines (+6115/-1162)
35 files modified
Documentation/networking/devlink/devlink-trap.rst (+595/-0)
Documentation/networking/devlink/index.rst (+14/-0)
Documentation/networking/index.rst (+2/-3)
MAINTAINERS (+1/-0)
debian.bluefield/config/config.common.ubuntu (+8/-23)
dev/null (+0/-33)
drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c (+15/-17)
drivers/net/ethernet/mellanox/mlx4/crdump.c (+27/-9)
drivers/net/ethernet/mellanox/mlx4/main.c (+11/-2)
drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c (+5/-3)
drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c (+5/-3)
drivers/net/ethernet/mellanox/mlx5/core/health.c (+10/-6)
drivers/net/ethernet/mellanox/mlxsw/core.c (+18/-4)
drivers/net/ethernet/mellanox/mlxsw/core.h (+5/-0)
drivers/net/ethernet/mellanox/mlxsw/spectrum.h (+1/-1)
drivers/net/ethernet/netronome/nfp/nfp_devlink.c (+10/-4)
drivers/net/ethernet/pensando/ionic/ionic_devlink.c (+3/-2)
drivers/net/netdevsim/dev.c (+24/-9)
include/net/devlink.h (+871/-57)
include/net/genetlink.h (+23/-7)
include/net/netlink.h (+47/-28)
include/net/pkt_cls.h (+1/-0)
include/net/red.h (+38/-0)
include/trace/events/devlink.h (+37/-0)
include/uapi/linux/devlink.h (+152/-0)
include/uapi/linux/pkt_sched.h (+17/-0)
lib/nlattr.c (+10/-10)
net/Kconfig (+0/-1)
net/core/devlink.c (+3855/-777)
net/core/drop_monitor.c (+46/-16)
net/dsa/dsa2.c (+11/-6)
net/netlink/genetlink.c (+183/-121)
net/sched/act_api.c (+7/-12)
net/sched/sch_red.c (+60/-8)
tools/testing/selftests/drivers/net/netdevsim/devlink.sh (+3/-0)
Reviewer Review Type Date Requested Status
Canonical Kernel Pending
Review via email: mp+416211@code.launchpad.net

This proposal supersedes a proposal from 2022-02-28.

Commit message

Managing TX rate of VFs becomes non-trivial task when a big number of VFs are
used. This issue can be handled with some grouping mechanism.

Currently driver provide two ways to limit TX rate of the VF: TC police action
and NDO API callback. Implementation of grouping within this two infrastructures
problematic, due to the following:

NDO API rate limiting is legacy feature, even though it's available in switchdev
mode, and extending it with new abstraction is not good anyway;

TC police action is flow based and requires net device with Qdisc on it and
implementing this will bring unwanted complications.

According to aforesaid devlink is the most appropriate place.

In order to cherry pick the devlink patch for VF group rate limit, devlink API
patches before are needed to maintain a clear history.

To post a comment you must log in.

Unmerged commits

e1f2032... by Moshe Shemesh <email address hidden>

devlink: Fix reload stats structure

BugLink: https://bugs.launchpad.net/bugs/1962490

Fix reload stats structure exposed to the user. Change stats structure
hierarchy to have the reload action as a parent of the stat entry and
then stat entry includes value per limit. This will also help to avoid
string concatenation on iproute2 output.

Reload stats structure before this fix:
"stats": {
    "reload": {
        "driver_reinit": 2,
        "fw_activate": 1,
        "fw_activate_no_reset": 0
     }
}

After this fix:
"stats": {
    "reload": {
        "driver_reinit": {
            "unspecified": 2
        },
        "fw_activate": {
            "unspecified": 1,
            "no_reset": 0
        }
}

Fixes: a254c264267e ("devlink: Add reload stats")
Signed-off-by: Moshe Shemesh <email address hidden>
Reviewed-by: Jiri Pirko <email address hidden>
Link: https://<email address hidden>
Signed-off-by: Jakub Kicinski <email address hidden>
(cherry picked from commit 5204bb683c1633e550c2124ccc2358dd645a80db)
Signed-off-by: Bodong Wang <email address hidden>

c8269a0... by Vladyslav Tarasiuk <email address hidden>

devlink: Rework devlink health reporter destructor

BugLink: https://bugs.launchpad.net/bugs/1962490

Devlink keeps its own reference to every reporter in a list and inits
refcount to 1 upon reporter's creation. Existing destructor waits to
free the memory indefinitely using msleep() until all references except
devlink's own are put.

Rework this mechanism by moving memory free routine to a separate
function, which is called when the last reporter reference is put.

Besides, it allows to call __devlink_health_reporter_destroy() while
locked on a reporters list mutex in symmetry to
__devlink_health_reporter_create(), which is required in follow-up
patch.

Signed-off-by: Vladyslav Tarasiuk <email address hidden>
Reviewed-by: Moshe Shemesh <email address hidden>
Reviewed-by: Jiri Pirko <email address hidden>
Signed-off-by: David S. Miller <email address hidden>
(cherry picked from commit 3c5584bf0a0493e8d232ade65f4b9c5e995f3a0c)
Signed-off-by: Bodong Wang <email address hidden>

7c1c03e... by Ido Schimmel <email address hidden>

drop_monitor: Convert to using devlink tracepoint

BugLink: https://bugs.launchpad.net/bugs/1962490

Convert drop monitor to use the recently introduced
'devlink_trap_report' tracepoint instead of having devlink call into
drop monitor.

This is both consistent with software originated drops ('kfree_skb'
tracepoint) and also allows drop monitor to be built as a module and
still report hardware originated drops.

Signed-off-by: Ido Schimmel <email address hidden>
Reviewed-by: Jiri Pirko <email address hidden>
Signed-off-by: David S. Miller <email address hidden>
(backported from commit 8ee2267ad33e0ba021e9dd9b437f773906cd99d6)
Signed-off-by: Bodong Wang <email address hidden>
[bodong: ignore doc and drop monitor]

Conflicts:
 MAINTAINERS
 include/net/drop_monitor.h

86e7116... by Ido Schimmel <email address hidden>

devlink: Add 'control' trap type

BugLink: https://bugs.launchpad.net/bugs/1962490

This type is used for traps that trap control packets such as ARP
request and IGMP query to the CPU.

Do not report such packets to the kernel's drop monitor as they were not
dropped by the device no encountered an exception during forwarding.

Signed-off-by: Ido Schimmel <email address hidden>
Reviewed-by: Jiri Pirko <email address hidden>
Signed-off-by: David S. Miller <email address hidden>
(cherry picked from commit 30a4e9a29ab9aadfe6c5386ae4aa396b1d2556c2)
Signed-off-by: Bodong Wang <email address hidden>

2cf5515... by Leon Romanovsky <email address hidden>

devlink: Remove check of always valid devlink pointer

BugLink: https://bugs.launchpad.net/bugs/1962490

Devlink objects are accessible only after they were registered and
have valid devlink_*->devlink pointers.

Remove that check and simplify respective fill functions as an outcome
of such change.

Signed-off-by: Leon Romanovsky <email address hidden>
Signed-off-by: David S. Miller <email address hidden>
(backported from commit 7ca973dc9fe589dc0ab2650641f4c7a19cc49ecd)
Signed-off-by: Bodong Wang <email address hidden>
[bodong: fix conflict due to api change]

Conflicts:
 net/core/devlink.c

777e34d... by Jakub Kicinski <email address hidden>

devlink: factor out building a snapshot notification

BugLink: https://bugs.launchpad.net/bugs/1962490

We'll need to send snapshot info back on the socket
which requested a snapshot to be created. Factor out
constructing a snapshot description from the broadcast
notification code.

v3: new patch

Signed-off-by: Jakub Kicinski <email address hidden>
Reviewed-by: Jiri Pirko <email address hidden>
Reviewed-by: Jacob Keller <email address hidden>
Signed-off-by: David S. Miller <email address hidden>
(cherry picked from commit dd86fec7e06ab792fe470c66a67ff42bf5d72b91)
Signed-off-by: Bodong Wang <email address hidden>

24d0d78... by Leon Romanovsky <email address hidden>

devlink: Allocate devlink directly in requested net namespace

BugLink: https://bugs.launchpad.net/bugs/1962490

There is no need in extra call indirection and check from impossible
flow where someone tries to set namespace without prior call
to devlink_alloc().

Instead of this extra logic and additional EXPORT_SYMBOL, use specialized
devlink allocation function that receives net namespace as an argument.

Such specialized API allows clear view when devlink initialized in wrong
net namespace and/or kernel users don't try to change devlink namespace
under the hood.

Reviewed-by: Jiri Pirko <email address hidden>
Signed-off-by: Leon Romanovsky <email address hidden>
Signed-off-by: Jakub Kicinski <email address hidden>
(backported from commit 26713455048eb19122b1561b471d30710177ef97)
Signed-off-by: Bodong Wang <email address hidden>
[bodong: ignore netdevsim]

Conflicts:
 drivers/net/netdevsim/dev.c

19bb413... by Leon Romanovsky <email address hidden>

devlink: Break parameter notification sequence to be before/after unload/load driver

BugLink: https://bugs.launchpad.net/bugs/1962490

The change of namespaces during devlink reload calls to driver unload
before it accesses devlink parameters. The commands below causes to
use-after-free bug when trying to get flow steering mode.

 * ip netns add n1
 * devlink dev reload pci/0000:00:09.0 netns n1

 ==================================================================
 BUG: KASAN: use-after-free in mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
 Read of size 4 at addr ffff888009d04308 by task devlink/275

 CPU: 6 PID: 275 Comm: devlink Not tainted 5.12.0-rc2+ #2853
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 Call Trace:
  dump_stack+0x93/0xc2
  print_address_description.constprop.0+0x18/0x140
  ? mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
  ? mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
  kasan_report.cold+0x7c/0xd8
  ? mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
  mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
  devlink_nl_param_fill+0x1c8/0xe80
  ? __free_pages_ok+0x37a/0x8a0
  ? devlink_flash_update_timeout_notify+0xd0/0xd0
  ? lock_acquire+0x1a9/0x6d0
  ? fs_reclaim_acquire+0xb7/0x160
  ? lock_is_held_type+0x98/0x110
  ? 0xffffffff81000000
  ? lock_release+0x1f9/0x6c0
  ? fs_reclaim_release+0xa1/0xf0
  ? lock_downgrade+0x6d0/0x6d0
  ? lock_is_held_type+0x98/0x110
  ? lock_is_held_type+0x98/0x110
  ? memset+0x20/0x40
  ? __build_skb_around+0x1f8/0x2b0
  devlink_param_notify+0x6d/0x180
  devlink_reload+0x1c3/0x520
  ? devlink_remote_reload_actions_performed+0x30/0x30
  ? mutex_trylock+0x24b/0x2d0
  ? devlink_nl_cmd_reload+0x62b/0x1070
  devlink_nl_cmd_reload+0x66d/0x1070
  ? devlink_reload+0x520/0x520
  ? devlink_get_from_attrs+0x1bc/0x260
  ? devlink_nl_pre_doit+0x64/0x4d0
  genl_family_rcv_msg_doit+0x1e9/0x2f0
  ? mutex_lock_io_nested+0x1130/0x1130
  ? genl_family_rcv_msg_attrs_parse.constprop.0+0x240/0x240
  ? security_capable+0x51/0x90
  genl_rcv_msg+0x27f/0x4a0
  ? genl_get_cmd+0x3c0/0x3c0
  ? lock_acquire+0x1a9/0x6d0
  ? devlink_reload+0x520/0x520
  ? lock_release+0x6c0/0x6c0
  netlink_rcv_skb+0x11d/0x340
  ? genl_get_cmd+0x3c0/0x3c0
  ? netlink_ack+0x9f0/0x9f0
  ? lock_release+0x1f9/0x6c0
  genl_rcv+0x24/0x40
  netlink_unicast+0x433/0x700
  ? netlink_attachskb+0x730/0x730
  ? _copy_from_iter_full+0x178/0x650
  ? __alloc_skb+0x113/0x2b0
  netlink_sendmsg+0x6f1/0xbd0
  ? netlink_unicast+0x700/0x700
  ? lock_is_held_type+0x98/0x110
  ? netlink_unicast+0x700/0x700
  sock_sendmsg+0xb0/0xe0
  __sys_sendto+0x193/0x240
  ? __x64_sys_getpeername+0xb0/0xb0
  ? do_sys_openat2+0x10b/0x370
  ? __up_read+0x1a1/0x7b0
  ? do_user_addr_fault+0x219/0xdc0
  ? __x64_sys_openat+0x120/0x1d0
  ? __x64_sys_open+0x1a0/0x1a0
  __x64_sys_sendto+0xdd/0x1b0
  ? syscall_enter_from_user_mode+0x1d/0x50
  do_syscall_64+0x2d/0x40
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7fc69d0af14a
 Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 c3 0f 1f 44 00 00 55 48 83 ec 30 44 89 4c
 RSP: 002b:00007ffc1d8292f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
 RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fc69d0af14a
 RDX: 0000000000000038 RSI: 0000555f57c56440 RDI: 0000000000000003
 RBP: 0000555f57c56410 R08: 00007fc69d17b200 R09: 000000000000000c
 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

 Allocated by task 146:
  kasan_save_stack+0x1b/0x40
  __kasan_kmalloc+0x99/0xc0
  mlx5_init_fs+0xf0/0x1c50 [mlx5_core]
  mlx5_load+0xd2/0x180 [mlx5_core]
  mlx5_init_one+0x2f6/0x450 [mlx5_core]
  probe_one+0x47d/0x6e0 [mlx5_core]
  pci_device_probe+0x2a0/0x4a0
  really_probe+0x20a/0xc90
  driver_probe_device+0xd8/0x380
  device_driver_attach+0x1df/0x250
  __driver_attach+0xff/0x240
  bus_for_each_dev+0x11e/0x1a0
  bus_add_driver+0x309/0x570
  driver_register+0x1ee/0x380
  0xffffffffa06b8062
  do_one_initcall+0xd5/0x410
  do_init_module+0x1c8/0x760
  load_module+0x6d8b/0x9650
  __do_sys_finit_module+0x118/0x1b0
  do_syscall_64+0x2d/0x40
  entry_SYSCALL_64_after_hwframe+0x44/0xae

 Freed by task 275:
  kasan_save_stack+0x1b/0x40
  kasan_set_track+0x1c/0x30
  kasan_set_free_info+0x20/0x30
  __kasan_slab_free+0x102/0x140
  slab_free_freelist_hook+0x74/0x1b0
  kfree+0xd7/0x2a0
  mlx5_unload+0x16/0xb0 [mlx5_core]
  mlx5_unload_one+0xae/0x120 [mlx5_core]
  mlx5_devlink_reload_down+0x1bc/0x380 [mlx5_core]
  devlink_reload+0x141/0x520
  devlink_nl_cmd_reload+0x66d/0x1070
  genl_family_rcv_msg_doit+0x1e9/0x2f0
  genl_rcv_msg+0x27f/0x4a0
  netlink_rcv_skb+0x11d/0x340
  genl_rcv+0x24/0x40
  netlink_unicast+0x433/0x700
  netlink_sendmsg+0x6f1/0xbd0
  sock_sendmsg+0xb0/0xe0
  __sys_sendto+0x193/0x240
  __x64_sys_sendto+0xdd/0x1b0
  do_syscall_64+0x2d/0x40
  entry_SYSCALL_64_after_hwframe+0x44/0xae

 The buggy address belongs to the object at ffff888009d04300
  which belongs to the cache kmalloc-128 of size 128
 The buggy address is located 8 bytes inside of
  128-byte region [ffff888009d04300, ffff888009d04380)
 The buggy address belongs to the page:
 page:0000000086a64ecc refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888009d04000 pfn:0x9d04
 head:0000000086a64ecc order:1 compound_mapcount:0
 flags: 0x4000000000010200(slab|head)
 raw: 4000000000010200 ffffea0000203980 0000000200000002 ffff8880050428c0
 raw: ffff888009d04000 000000008020001d 00000001ffffffff 0000000000000000
 page dumped because: kasan: bad access detected

 Memory state around the buggy address:
  ffff888009d04200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff888009d04280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 >ffff888009d04300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                       ^
  ffff888009d04380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ffff888009d04400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ==================================================================

The right solution to devlink reload is to notify about deletion of
parameters, unload driver, change net namespaces, load driver and notify
about addition of parameters.

Fixes: 070c63f20f6c ("net: devlink: allow to change namespaces during reload")
Reviewed-by: Parav Pandit <email address hidden>
Signed-off-by: Leon Romanovsky <email address hidden>
Signed-off-by: Jakub Kicinski <email address hidden>
(cherry picked from commit 05a7f4a8dff19999ca8a83a35ff4782689de7bfc)
Signed-off-by: Bodong Wang <email address hidden>

9468571... by Leon Romanovsky <email address hidden>

devlink: Remove duplicated registration check

BugLink: https://bugs.launchpad.net/bugs/1962490

Both registered flag and devlink pointer are set at the same time
and indicate the same thing - devlink/devlink_port are ready. Instead
of checking ->registered use devlink pointer as an indication.

Signed-off-by: Leon Romanovsky <email address hidden>
Signed-off-by: David S. Miller <email address hidden>
(backported from commit d7907a2b1a3b89bea136025f885035a083525e41)
Signed-off-by: Bodong Wang <email address hidden>
[bodong: resolve conflict due to new line]

Conflicts:
 net/core/devlink.c

61af07b... by Oleksandr Mazur <email address hidden>

net: core: devlink: add dropped stats traps field

BugLink: https://bugs.launchpad.net/bugs/1962490

Whenever query statistics is issued for trap, devlink subsystem
would also fill-in statistics 'dropped' field. This field indicates
the number of packets HW dropped and failed to report to the device driver,
and thus - to the devlink subsystem itself.
In case if device driver didn't register callback for hard drop
statistics querying, 'dropped' field will be omitted and not filled.

Signed-off-by: Oleksandr Mazur <email address hidden>
Signed-off-by: David S. Miller <email address hidden>
(cherry picked from commit ddee9dbc3d7aec1cd9fdcc671db2dd0016fd0f3d)
Signed-off-by: Bodong Wang <email address hidden>

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/Documentation/networking/devlink-trap.rst b/Documentation/networking/devlink-trap.rst
2deleted file mode 100644
3index 8e90a85..0000000
4--- a/Documentation/networking/devlink-trap.rst
5+++ /dev/null
6@@ -1,209 +0,0 @@
7-.. SPDX-License-Identifier: GPL-2.0
8-
9-============
10-Devlink Trap
11-============
12-
13-Background
14-==========
15-
16-Devices capable of offloading the kernel's datapath and perform functions such
17-as bridging and routing must also be able to send specific packets to the
18-kernel (i.e., the CPU) for processing.
19-
20-For example, a device acting as a multicast-aware bridge must be able to send
21-IGMP membership reports to the kernel for processing by the bridge module.
22-Without processing such packets, the bridge module could never populate its
23-MDB.
24-
25-As another example, consider a device acting as router which has received an IP
26-packet with a TTL of 1. Upon routing the packet the device must send it to the
27-kernel so that it will route it as well and generate an ICMP Time Exceeded
28-error datagram. Without letting the kernel route such packets itself, utilities
29-such as ``traceroute`` could never work.
30-
31-The fundamental ability of sending certain packets to the kernel for processing
32-is called "packet trapping".
33-
34-Overview
35-========
36-
37-The ``devlink-trap`` mechanism allows capable device drivers to register their
38-supported packet traps with ``devlink`` and report trapped packets to
39-``devlink`` for further analysis.
40-
41-Upon receiving trapped packets, ``devlink`` will perform a per-trap packets and
42-bytes accounting and potentially report the packet to user space via a netlink
43-event along with all the provided metadata (e.g., trap reason, timestamp, input
44-port). This is especially useful for drop traps (see :ref:`Trap-Types`)
45-as it allows users to obtain further visibility into packet drops that would
46-otherwise be invisible.
47-
48-The following diagram provides a general overview of ``devlink-trap``::
49-
50- Netlink event: Packet w/ metadata
51- Or a summary of recent drops
52- ^
53- |
54- Userspace |
55- +---------------------------------------------------+
56- Kernel |
57- |
58- +-------+--------+
59- | |
60- | drop_monitor |
61- | |
62- +-------^--------+
63- |
64- |
65- |
66- +----+----+
67- | | Kernel's Rx path
68- | devlink | (non-drop traps)
69- | |
70- +----^----+ ^
71- | |
72- +-----------+
73- |
74- +-------+-------+
75- | |
76- | Device driver |
77- | |
78- +-------^-------+
79- Kernel |
80- +---------------------------------------------------+
81- Hardware |
82- | Trapped packet
83- |
84- +--+---+
85- | |
86- | ASIC |
87- | |
88- +------+
89-
90-.. _Trap-Types:
91-
92-Trap Types
93-==========
94-
95-The ``devlink-trap`` mechanism supports the following packet trap types:
96-
97- * ``drop``: Trapped packets were dropped by the underlying device. Packets
98- are only processed by ``devlink`` and not injected to the kernel's Rx path.
99- The trap action (see :ref:`Trap-Actions`) can be changed.
100- * ``exception``: Trapped packets were not forwarded as intended by the
101- underlying device due to an exception (e.g., TTL error, missing neighbour
102- entry) and trapped to the control plane for resolution. Packets are
103- processed by ``devlink`` and injected to the kernel's Rx path. Changing the
104- action of such traps is not allowed, as it can easily break the control
105- plane.
106-
107-.. _Trap-Actions:
108-
109-Trap Actions
110-============
111-
112-The ``devlink-trap`` mechanism supports the following packet trap actions:
113-
114- * ``trap``: The sole copy of the packet is sent to the CPU.
115- * ``drop``: The packet is dropped by the underlying device and a copy is not
116- sent to the CPU.
117-
118-Generic Packet Traps
119-====================
120-
121-Generic packet traps are used to describe traps that trap well-defined packets
122-or packets that are trapped due to well-defined conditions (e.g., TTL error).
123-Such traps can be shared by multiple device drivers and their description must
124-be added to the following table:
125-
126-.. list-table:: List of Generic Packet Traps
127- :widths: 5 5 90
128-
129- * - Name
130- - Type
131- - Description
132- * - ``source_mac_is_multicast``
133- - ``drop``
134- - Traps incoming packets that the device decided to drop because of a
135- multicast source MAC
136- * - ``vlan_tag_mismatch``
137- - ``drop``
138- - Traps incoming packets that the device decided to drop in case of VLAN
139- tag mismatch: The ingress bridge port is not configured with a PVID and
140- the packet is untagged or prio-tagged
141- * - ``ingress_vlan_filter``
142- - ``drop``
143- - Traps incoming packets that the device decided to drop in case they are
144- tagged with a VLAN that is not configured on the ingress bridge port
145- * - ``ingress_spanning_tree_filter``
146- - ``drop``
147- - Traps incoming packets that the device decided to drop in case the STP
148- state of the ingress bridge port is not "forwarding"
149- * - ``port_list_is_empty``
150- - ``drop``
151- - Traps packets that the device decided to drop in case they need to be
152- flooded (e.g., unknown unicast, unregistered multicast) and there are
153- no ports the packets should be flooded to
154- * - ``port_loopback_filter``
155- - ``drop``
156- - Traps packets that the device decided to drop in case after layer 2
157- forwarding the only port from which they should be transmitted through
158- is the port from which they were received
159- * - ``blackhole_route``
160- - ``drop``
161- - Traps packets that the device decided to drop in case they hit a
162- blackhole route
163- * - ``ttl_value_is_too_small``
164- - ``exception``
165- - Traps unicast packets that should be forwarded by the device whose TTL
166- was decremented to 0 or less
167- * - ``tail_drop``
168- - ``drop``
169- - Traps packets that the device decided to drop because they could not be
170- enqueued to a transmission queue which is full
171-
172-Driver-specific Packet Traps
173-============================
174-
175-Device drivers can register driver-specific packet traps, but these must be
176-clearly documented. Such traps can correspond to device-specific exceptions and
177-help debug packet drops caused by these exceptions. The following list includes
178-links to the description of driver-specific traps registered by various device
179-drivers:
180-
181- * :doc:`/devlink-trap-netdevsim`
182-
183-Generic Packet Trap Groups
184-==========================
185-
186-Generic packet trap groups are used to aggregate logically related packet
187-traps. These groups allow the user to batch operations such as setting the trap
188-action of all member traps. In addition, ``devlink-trap`` can report aggregated
189-per-group packets and bytes statistics, in case per-trap statistics are too
190-narrow. The description of these groups must be added to the following table:
191-
192-.. list-table:: List of Generic Packet Trap Groups
193- :widths: 10 90
194-
195- * - Name
196- - Description
197- * - ``l2_drops``
198- - Contains packet traps for packets that were dropped by the device during
199- layer 2 forwarding (i.e., bridge)
200- * - ``l3_drops``
201- - Contains packet traps for packets that were dropped by the device or hit
202- an exception (e.g., TTL error) during layer 3 forwarding
203- * - ``buffer_drops``
204- - Contains packet traps for packets that were dropped by the device due to
205- an enqueue decision
206-
207-Testing
208-=======
209-
210-See ``tools/testing/selftests/drivers/net/netdevsim/devlink_trap.sh`` for a
211-test covering the core infrastructure. Test cases should be added for any new
212-functionality.
213-
214-Device drivers should focus their tests on device-specific functionality, such
215-as the triggering of supported packet traps.
216diff --git a/Documentation/networking/devlink-health.txt b/Documentation/networking/devlink/devlink-health.txt
217similarity index 100%
218rename from Documentation/networking/devlink-health.txt
219rename to Documentation/networking/devlink/devlink-health.txt
220diff --git a/Documentation/networking/devlink-info-versions.rst b/Documentation/networking/devlink/devlink-info-versions.rst
221similarity index 100%
222rename from Documentation/networking/devlink-info-versions.rst
223rename to Documentation/networking/devlink/devlink-info-versions.rst
224diff --git a/Documentation/networking/devlink-params-bnxt.txt b/Documentation/networking/devlink/devlink-params-bnxt.txt
225similarity index 100%
226rename from Documentation/networking/devlink-params-bnxt.txt
227rename to Documentation/networking/devlink/devlink-params-bnxt.txt
228diff --git a/Documentation/networking/devlink-params-mlx5.txt b/Documentation/networking/devlink/devlink-params-mlx5.txt
229similarity index 100%
230rename from Documentation/networking/devlink-params-mlx5.txt
231rename to Documentation/networking/devlink/devlink-params-mlx5.txt
232diff --git a/Documentation/networking/devlink-params-mlxsw.txt b/Documentation/networking/devlink/devlink-params-mlxsw.txt
233similarity index 100%
234rename from Documentation/networking/devlink-params-mlxsw.txt
235rename to Documentation/networking/devlink/devlink-params-mlxsw.txt
236diff --git a/Documentation/networking/devlink-params-nfp.txt b/Documentation/networking/devlink/devlink-params-nfp.txt
237similarity index 100%
238rename from Documentation/networking/devlink-params-nfp.txt
239rename to Documentation/networking/devlink/devlink-params-nfp.txt
240diff --git a/Documentation/networking/devlink-params.txt b/Documentation/networking/devlink/devlink-params.txt
241similarity index 100%
242rename from Documentation/networking/devlink-params.txt
243rename to Documentation/networking/devlink/devlink-params.txt
244diff --git a/Documentation/networking/devlink-trap-netdevsim.rst b/Documentation/networking/devlink/devlink-trap-netdevsim.rst
245similarity index 100%
246rename from Documentation/networking/devlink-trap-netdevsim.rst
247rename to Documentation/networking/devlink/devlink-trap-netdevsim.rst
248diff --git a/Documentation/networking/devlink/devlink-trap.rst b/Documentation/networking/devlink/devlink-trap.rst
249new file mode 100644
250index 0000000..3ccdbbe
251--- /dev/null
252+++ b/Documentation/networking/devlink/devlink-trap.rst
253@@ -0,0 +1,595 @@
254+.. SPDX-License-Identifier: GPL-2.0
255+
256+============
257+Devlink Trap
258+============
259+
260+Background
261+==========
262+
263+Devices capable of offloading the kernel's datapath and perform functions such
264+as bridging and routing must also be able to send specific packets to the
265+kernel (i.e., the CPU) for processing.
266+
267+For example, a device acting as a multicast-aware bridge must be able to send
268+IGMP membership reports to the kernel for processing by the bridge module.
269+Without processing such packets, the bridge module could never populate its
270+MDB.
271+
272+As another example, consider a device acting as router which has received an IP
273+packet with a TTL of 1. Upon routing the packet the device must send it to the
274+kernel so that it will route it as well and generate an ICMP Time Exceeded
275+error datagram. Without letting the kernel route such packets itself, utilities
276+such as ``traceroute`` could never work.
277+
278+The fundamental ability of sending certain packets to the kernel for processing
279+is called "packet trapping".
280+
281+Overview
282+========
283+
284+The ``devlink-trap`` mechanism allows capable device drivers to register their
285+supported packet traps with ``devlink`` and report trapped packets to
286+``devlink`` for further analysis.
287+
288+Upon receiving trapped packets, ``devlink`` will perform a per-trap packets and
289+bytes accounting and potentially report the packet to user space via a netlink
290+event along with all the provided metadata (e.g., trap reason, timestamp, input
291+port). This is especially useful for drop traps (see :ref:`Trap-Types`)
292+as it allows users to obtain further visibility into packet drops that would
293+otherwise be invisible.
294+
295+The following diagram provides a general overview of ``devlink-trap``::
296+
297+ Netlink event: Packet w/ metadata
298+ Or a summary of recent drops
299+ ^
300+ |
301+ Userspace |
302+ +---------------------------------------------------+
303+ Kernel |
304+ |
305+ +-------+--------+
306+ | |
307+ | drop_monitor |
308+ | |
309+ +-------^--------+
310+ |
311+ | Non-control traps
312+ |
313+ +----+----+
314+ | | Kernel's Rx path
315+ | devlink | (non-drop traps)
316+ | |
317+ +----^----+ ^
318+ | |
319+ +-----------+
320+ |
321+ +-------+-------+
322+ | |
323+ | Device driver |
324+ | |
325+ +-------^-------+
326+ Kernel |
327+ +---------------------------------------------------+
328+ Hardware |
329+ | Trapped packet
330+ |
331+ +--+---+
332+ | |
333+ | ASIC |
334+ | |
335+ +------+
336+
337+.. _Trap-Types:
338+
339+Trap Types
340+==========
341+
342+The ``devlink-trap`` mechanism supports the following packet trap types:
343+
344+ * ``drop``: Trapped packets were dropped by the underlying device. Packets
345+ are only processed by ``devlink`` and not injected to the kernel's Rx path.
346+ The trap action (see :ref:`Trap-Actions`) can be changed.
347+ * ``exception``: Trapped packets were not forwarded as intended by the
348+ underlying device due to an exception (e.g., TTL error, missing neighbour
349+ entry) and trapped to the control plane for resolution. Packets are
350+ processed by ``devlink`` and injected to the kernel's Rx path. Changing the
351+ action of such traps is not allowed, as it can easily break the control
352+ plane.
353+ * ``control``: Trapped packets were trapped by the device because these are
354+ control packets required for the correct functioning of the control plane.
355+ For example, ARP request and IGMP query packets. Packets are injected to
356+ the kernel's Rx path, but not reported to the kernel's drop monitor.
357+ Changing the action of such traps is not allowed, as it can easily break
358+ the control plane.
359+
360+.. _Trap-Actions:
361+
362+Trap Actions
363+============
364+
365+The ``devlink-trap`` mechanism supports the following packet trap actions:
366+
367+ * ``trap``: The sole copy of the packet is sent to the CPU.
368+ * ``drop``: The packet is dropped by the underlying device and a copy is not
369+ sent to the CPU.
370+
371+Generic Packet Traps
372+====================
373+
374+Generic packet traps are used to describe traps that trap well-defined packets
375+or packets that are trapped due to well-defined conditions (e.g., TTL error).
376+Such traps can be shared by multiple device drivers and their description must
377+be added to the following table:
378+
379+.. list-table:: List of Generic Packet Traps
380+ :widths: 5 5 90
381+
382+ * - Name
383+ - Type
384+ - Description
385+ * - ``source_mac_is_multicast``
386+ - ``drop``
387+ - Traps incoming packets that the device decided to drop because of a
388+ multicast source MAC
389+ * - ``vlan_tag_mismatch``
390+ - ``drop``
391+ - Traps incoming packets that the device decided to drop in case of VLAN
392+ tag mismatch: The ingress bridge port is not configured with a PVID and
393+ the packet is untagged or prio-tagged
394+ * - ``ingress_vlan_filter``
395+ - ``drop``
396+ - Traps incoming packets that the device decided to drop in case they are
397+ tagged with a VLAN that is not configured on the ingress bridge port
398+ * - ``ingress_spanning_tree_filter``
399+ - ``drop``
400+ - Traps incoming packets that the device decided to drop in case the STP
401+ state of the ingress bridge port is not "forwarding"
402+ * - ``port_list_is_empty``
403+ - ``drop``
404+ - Traps packets that the device decided to drop in case they need to be
405+ flooded (e.g., unknown unicast, unregistered multicast) and there are
406+ no ports the packets should be flooded to
407+ * - ``port_loopback_filter``
408+ - ``drop``
409+ - Traps packets that the device decided to drop in case after layer 2
410+ forwarding the only port from which they should be transmitted through
411+ is the port from which they were received
412+ * - ``blackhole_route``
413+ - ``drop``
414+ - Traps packets that the device decided to drop in case they hit a
415+ blackhole route
416+ * - ``ttl_value_is_too_small``
417+ - ``exception``
418+ - Traps unicast packets that should be forwarded by the device whose TTL
419+ was decremented to 0 or less
420+ * - ``tail_drop``
421+ - ``drop``
422+ - Traps packets that the device decided to drop because they could not be
423+ enqueued to a transmission queue which is full
424+ * - ``non_ip``
425+ - ``drop``
426+ - Traps packets that the device decided to drop because they need to
427+ undergo a layer 3 lookup, but are not IP or MPLS packets
428+ * - ``uc_dip_over_mc_dmac``
429+ - ``drop``
430+ - Traps packets that the device decided to drop because they need to be
431+ routed and they have a unicast destination IP and a multicast destination
432+ MAC
433+ * - ``dip_is_loopback_address``
434+ - ``drop``
435+ - Traps packets that the device decided to drop because they need to be
436+ routed and their destination IP is the loopback address (i.e., 127.0.0.0/8
437+ and ::1/128)
438+ * - ``sip_is_mc``
439+ - ``drop``
440+ - Traps packets that the device decided to drop because they need to be
441+ routed and their source IP is multicast (i.e., 224.0.0.0/8 and ff::/8)
442+ * - ``sip_is_loopback_address``
443+ - ``drop``
444+ - Traps packets that the device decided to drop because they need to be
445+ routed and their source IP is the loopback address (i.e., 127.0.0.0/8 and ::1/128)
446+ * - ``ip_header_corrupted``
447+ - ``drop``
448+ - Traps packets that the device decided to drop because they need to be
449+ routed and their IP header is corrupted: wrong checksum, wrong IP version
450+ or too short Internet Header Length (IHL)
451+ * - ``ipv4_sip_is_limited_bc``
452+ - ``drop``
453+ - Traps packets that the device decided to drop because they need to be
454+ routed and their source IP is limited broadcast (i.e., 255.255.255.255/32)
455+ * - ``ipv6_mc_dip_reserved_scope``
456+ - ``drop``
457+ - Traps IPv6 packets that the device decided to drop because they need to
458+ be routed and their IPv6 multicast destination IP has a reserved scope
459+ (i.e., ffx0::/16)
460+ * - ``ipv6_mc_dip_interface_local_scope``
461+ - ``drop``
462+ - Traps IPv6 packets that the device decided to drop because they need to
463+ be routed and their IPv6 multicast destination IP has an interface-local scope
464+ (i.e., ffx1::/16)
465+ * - ``mtu_value_is_too_small``
466+ - ``exception``
467+ - Traps packets that should have been routed by the device, but were bigger
468+ than the MTU of the egress interface
469+ * - ``unresolved_neigh``
470+ - ``exception``
471+ - Traps packets that did not have a matching IP neighbour after routing
472+ * - ``mc_reverse_path_forwarding``
473+ - ``exception``
474+ - Traps multicast IP packets that failed reverse-path forwarding (RPF)
475+ check during multicast routing
476+ * - ``reject_route``
477+ - ``exception``
478+ - Traps packets that hit reject routes (i.e., "unreachable", "prohibit")
479+ * - ``ipv4_lpm_miss``
480+ - ``exception``
481+ - Traps unicast IPv4 packets that did not match any route
482+ * - ``ipv6_lpm_miss``
483+ - ``exception``
484+ - Traps unicast IPv6 packets that did not match any route
485+ * - ``non_routable_packet``
486+ - ``drop``
487+ - Traps packets that the device decided to drop because they are not
488+ supposed to be routed. For example, IGMP queries can be flooded by the
489+ device in layer 2 and reach the router. Such packets should not be
490+ routed and instead dropped
491+ * - ``decap_error``
492+ - ``exception``
493+ - Traps NVE and IPinIP packets that the device decided to drop because of
494+ failure during decapsulation (e.g., packet being too short, reserved
495+ bits set in VXLAN header)
496+ * - ``overlay_smac_is_mc``
497+ - ``drop``
498+ - Traps NVE packets that the device decided to drop because their overlay
499+ source MAC is multicast
500+ * - ``ingress_flow_action_drop``
501+ - ``drop``
502+ - Traps packets dropped during processing of ingress flow action drop
503+ * - ``egress_flow_action_drop``
504+ - ``drop``
505+ - Traps packets dropped during processing of egress flow action drop
506+ * - ``stp``
507+ - ``control``
508+ - Traps STP packets
509+ * - ``lacp``
510+ - ``control``
511+ - Traps LACP packets
512+ * - ``lldp``
513+ - ``control``
514+ - Traps LLDP packets
515+ * - ``igmp_query``
516+ - ``control``
517+ - Traps IGMP Membership Query packets
518+ * - ``igmp_v1_report``
519+ - ``control``
520+ - Traps IGMP Version 1 Membership Report packets
521+ * - ``igmp_v2_report``
522+ - ``control``
523+ - Traps IGMP Version 2 Membership Report packets
524+ * - ``igmp_v3_report``
525+ - ``control``
526+ - Traps IGMP Version 3 Membership Report packets
527+ * - ``igmp_v2_leave``
528+ - ``control``
529+ - Traps IGMP Version 2 Leave Group packets
530+ * - ``mld_query``
531+ - ``control``
532+ - Traps MLD Multicast Listener Query packets
533+ * - ``mld_v1_report``
534+ - ``control``
535+ - Traps MLD Version 1 Multicast Listener Report packets
536+ * - ``mld_v2_report``
537+ - ``control``
538+ - Traps MLD Version 2 Multicast Listener Report packets
539+ * - ``mld_v1_done``
540+ - ``control``
541+ - Traps MLD Version 1 Multicast Listener Done packets
542+ * - ``ipv4_dhcp``
543+ - ``control``
544+ - Traps IPv4 DHCP packets
545+ * - ``ipv6_dhcp``
546+ - ``control``
547+ - Traps IPv6 DHCP packets
548+ * - ``arp_request``
549+ - ``control``
550+ - Traps ARP request packets
551+ * - ``arp_response``
552+ - ``control``
553+ - Traps ARP response packets
554+ * - ``arp_overlay``
555+ - ``control``
556+ - Traps NVE-decapsulated ARP packets that reached the overlay network.
557+ This is required, for example, when the address that needs to be
558+ resolved is a local address
559+ * - ``ipv6_neigh_solicit``
560+ - ``control``
561+ - Traps IPv6 Neighbour Solicitation packets
562+ * - ``ipv6_neigh_advert``
563+ - ``control``
564+ - Traps IPv6 Neighbour Advertisement packets
565+ * - ``ipv4_bfd``
566+ - ``control``
567+ - Traps IPv4 BFD packets
568+ * - ``ipv6_bfd``
569+ - ``control``
570+ - Traps IPv6 BFD packets
571+ * - ``ipv4_ospf``
572+ - ``control``
573+ - Traps IPv4 OSPF packets
574+ * - ``ipv6_ospf``
575+ - ``control``
576+ - Traps IPv6 OSPF packets
577+ * - ``ipv4_bgp``
578+ - ``control``
579+ - Traps IPv4 BGP packets
580+ * - ``ipv6_bgp``
581+ - ``control``
582+ - Traps IPv6 BGP packets
583+ * - ``ipv4_vrrp``
584+ - ``control``
585+ - Traps IPv4 VRRP packets
586+ * - ``ipv6_vrrp``
587+ - ``control``
588+ - Traps IPv6 VRRP packets
589+ * - ``ipv4_pim``
590+ - ``control``
591+ - Traps IPv4 PIM packets
592+ * - ``ipv6_pim``
593+ - ``control``
594+ - Traps IPv6 PIM packets
595+ * - ``uc_loopback``
596+ - ``control``
597+ - Traps unicast packets that need to be routed through the same layer 3
598+ interface from which they were received. Such packets are routed by the
599+ kernel, but also cause it to potentially generate ICMP redirect packets
600+ * - ``local_route``
601+ - ``control``
602+ - Traps unicast packets that hit a local route and need to be locally
603+ delivered
604+ * - ``external_route``
605+ - ``control``
606+ - Traps packets that should be routed through an external interface (e.g.,
607+ management interface) that does not belong to the same device (e.g.,
608+ switch ASIC) as the ingress interface
609+ * - ``ipv6_uc_dip_link_local_scope``
610+ - ``control``
611+ - Traps unicast IPv6 packets that need to be routed and have a destination
612+ IP address with a link-local scope (i.e., fe80::/10). The trap allows
613+ device drivers to avoid programming link-local routes, but still receive
614+ packets for local delivery
615+ * - ``ipv6_dip_all_nodes``
616+ - ``control``
617+ - Traps IPv6 packets that their destination IP address is the "All Nodes
618+ Address" (i.e., ff02::1)
619+ * - ``ipv6_dip_all_routers``
620+ - ``control``
621+ - Traps IPv6 packets that their destination IP address is the "All Routers
622+ Address" (i.e., ff02::2)
623+ * - ``ipv6_router_solicit``
624+ - ``control``
625+ - Traps IPv6 Router Solicitation packets
626+ * - ``ipv6_router_advert``
627+ - ``control``
628+ - Traps IPv6 Router Advertisement packets
629+ * - ``ipv6_redirect``
630+ - ``control``
631+ - Traps IPv6 Redirect Message packets
632+ * - ``ipv4_router_alert``
633+ - ``control``
634+ - Traps IPv4 packets that need to be routed and include the Router Alert
635+ option. Such packets need to be locally delivered to raw sockets that
636+ have the IP_ROUTER_ALERT socket option set
637+ * - ``ipv6_router_alert``
638+ - ``control``
639+ - Traps IPv6 packets that need to be routed and include the Router Alert
640+ option in their Hop-by-Hop extension header. Such packets need to be
641+ locally delivered to raw sockets that have the IPV6_ROUTER_ALERT socket
642+ option set
643+ * - ``ptp_event``
644+ - ``control``
645+ - Traps PTP time-critical event messages (Sync, Delay_req, Pdelay_Req and
646+ Pdelay_Resp)
647+ * - ``ptp_general``
648+ - ``control``
649+ - Traps PTP general messages (Announce, Follow_Up, Delay_Resp,
650+ Pdelay_Resp_Follow_Up, management and signaling)
651+ * - ``flow_action_sample``
652+ - ``control``
653+ - Traps packets sampled during processing of flow action sample (e.g., via
654+ tc's sample action)
655+ * - ``flow_action_trap``
656+ - ``control``
657+ - Traps packets logged during processing of flow action trap (e.g., via
658+ tc's trap action)
659+ * - ``early_drop``
660+ - ``drop``
661+ - Traps packets dropped due to the RED (Random Early Detection) algorithm
662+ (i.e., early drops)
663+ * - ``vxlan_parsing``
664+ - ``drop``
665+ - Traps packets dropped due to an error in the VXLAN header parsing which
666+ might be because of packet truncation or the I flag is not set.
667+ * - ``llc_snap_parsing``
668+ - ``drop``
669+ - Traps packets dropped due to an error in the LLC+SNAP header parsing
670+ * - ``vlan_parsing``
671+ - ``drop``
672+ - Traps packets dropped due to an error in the VLAN header parsing. Could
673+ include unexpected packet truncation.
674+ * - ``pppoe_ppp_parsing``
675+ - ``drop``
676+ - Traps packets dropped due to an error in the PPPoE+PPP header parsing.
677+ This could include finding a session ID of 0xFFFF (which is reserved and
678+ not for use), a PPPoE length which is larger than the frame received or
679+ any common error on this type of header
680+ * - ``mpls_parsing``
681+ - ``drop``
682+ - Traps packets dropped due to an error in the MPLS header parsing which
683+ could include unexpected header truncation
684+ * - ``arp_parsing``
685+ - ``drop``
686+ - Traps packets dropped due to an error in the ARP header parsing
687+ * - ``ip_1_parsing``
688+ - ``drop``
689+ - Traps packets dropped due to an error in the first IP header parsing.
690+ This packet trap could include packets which do not pass an IP checksum
691+ check, a header length check (a minimum of 20 bytes), which might suffer
692+ from packet truncation thus the total length field exceeds the received
693+ packet length etc
694+ * - ``ip_n_parsing``
695+ - ``drop``
696+ - Traps packets dropped due to an error in the parsing of the last IP
697+ header (the inner one in case of an IP over IP tunnel). The same common
698+ error checking is performed here as for the ip_1_parsing trap
699+ * - ``gre_parsing``
700+ - ``drop``
701+ - Traps packets dropped due to an error in the GRE header parsing
702+ * - ``udp_parsing``
703+ - ``drop``
704+ - Traps packets dropped due to an error in the UDP header parsing.
705+ This packet trap could include checksum errorrs, an improper UDP
706+ length detected (smaller than 8 bytes) or detection of header
707+ truncation.
708+ * - ``tcp_parsing``
709+ - ``drop``
710+ - Traps packets dropped due to an error in the TCP header parsing.
711+ This could include TCP checksum errors, improper combination of SYN, FIN
712+ and/or RESET etc.
713+ * - ``ipsec_parsing``
714+ - ``drop``
715+ - Traps packets dropped due to an error in the IPSEC header parsing
716+ * - ``sctp_parsing``
717+ - ``drop``
718+ - Traps packets dropped due to an error in the SCTP header parsing.
719+ This would mean that port number 0 was used or that the header is
720+ truncated.
721+ * - ``dccp_parsing``
722+ - ``drop``
723+ - Traps packets dropped due to an error in the DCCP header parsing
724+ * - ``gtp_parsing``
725+ - ``drop``
726+ - Traps packets dropped due to an error in the GTP header parsing
727+ * - ``esp_parsing``
728+ - ``drop``
729+ - Traps packets dropped due to an error in the ESP header parsing
730+ * - ``blackhole_nexthop``
731+ - ``drop``
732+ - Traps packets that the device decided to drop in case they hit a
733+ blackhole nexthop
734+ * - ``dmac_filter``
735+ - ``drop``
736+ - Traps incoming packets that the device decided to drop because
737+ the destination MAC is not configured in the MAC table and
738+ the interface is not in promiscuous mode
739+
740+Driver-specific Packet Traps
741+============================
742+
743+Device drivers can register driver-specific packet traps, but these must be
744+clearly documented. Such traps can correspond to device-specific exceptions and
745+help debug packet drops caused by these exceptions. The following list includes
746+links to the description of driver-specific traps registered by various device
747+drivers:
748+
749+ * :doc:`/devlink-trap-netdevsim`
750+
751+Generic Packet Trap Groups
752+==========================
753+
754+Generic packet trap groups are used to aggregate logically related packet
755+traps. These groups allow the user to batch operations such as setting the trap
756+action of all member traps. In addition, ``devlink-trap`` can report aggregated
757+per-group packets and bytes statistics, in case per-trap statistics are too
758+narrow. The description of these groups must be added to the following table:
759+
760+.. list-table:: List of Generic Packet Trap Groups
761+ :widths: 10 90
762+
763+ * - Name
764+ - Description
765+ * - ``l2_drops``
766+ - Contains packet traps for packets that were dropped by the device during
767+ layer 2 forwarding (i.e., bridge)
768+ * - ``l3_drops``
769+ - Contains packet traps for packets that were dropped by the device during
770+ layer 3 forwarding
771+ * - ``l3_exceptions``
772+ - Contains packet traps for packets that hit an exception (e.g., TTL
773+ error) during layer 3 forwarding
774+ * - ``buffer_drops``
775+ - Contains packet traps for packets that were dropped by the device due to
776+ an enqueue decision
777+ * - ``tunnel_drops``
778+ - Contains packet traps for packets that were dropped by the device during
779+ tunnel encapsulation / decapsulation
780+ * - ``acl_drops``
781+ - Contains packet traps for packets that were dropped by the device during
782+ ACL processing
783+ * - ``stp``
784+ - Contains packet traps for STP packets
785+ * - ``lacp``
786+ - Contains packet traps for LACP packets
787+ * - ``lldp``
788+ - Contains packet traps for LLDP packets
789+ * - ``mc_snooping``
790+ - Contains packet traps for IGMP and MLD packets required for multicast
791+ snooping
792+ * - ``dhcp``
793+ - Contains packet traps for DHCP packets
794+ * - ``neigh_discovery``
795+ - Contains packet traps for neighbour discovery packets (e.g., ARP, IPv6
796+ ND)
797+ * - ``bfd``
798+ - Contains packet traps for BFD packets
799+ * - ``ospf``
800+ - Contains packet traps for OSPF packets
801+ * - ``bgp``
802+ - Contains packet traps for BGP packets
803+ * - ``vrrp``
804+ - Contains packet traps for VRRP packets
805+ * - ``pim``
806+ - Contains packet traps for PIM packets
807+ * - ``uc_loopback``
808+ - Contains a packet trap for unicast loopback packets (i.e.,
809+ ``uc_loopback``). This trap is singled-out because in cases such as
810+ one-armed router it will be constantly triggered. To limit the impact on
811+ the CPU usage, a packet trap policer with a low rate can be bound to the
812+ group without affecting other traps
813+ * - ``local_delivery``
814+ - Contains packet traps for packets that should be locally delivered after
815+ routing, but do not match more specific packet traps (e.g.,
816+ ``ipv4_bgp``)
817+ * - ``external_delivery``
818+ - Contains packet traps for packets that should be routed through an
819+ external interface (e.g., management interface) that does not belong to
820+ the same device (e.g., switch ASIC) as the ingress interface
821+ * - ``ipv6``
822+ - Contains packet traps for various IPv6 control packets (e.g., Router
823+ Advertisements)
824+ * - ``ptp_event``
825+ - Contains packet traps for PTP time-critical event messages (Sync,
826+ Delay_req, Pdelay_Req and Pdelay_Resp)
827+ * - ``ptp_general``
828+ - Contains packet traps for PTP general messages (Announce, Follow_Up,
829+ Delay_Resp, Pdelay_Resp_Follow_Up, management and signaling)
830+ * - ``acl_sample``
831+ - Contains packet traps for packets that were sampled by the device during
832+ ACL processing
833+ * - ``acl_trap``
834+ - Contains packet traps for packets that were trapped (logged) by the
835+ device during ACL processing
836+ * - ``parser_error_drops``
837+ - Contains packet traps for packets that were marked by the device during
838+ parsing as erroneous
839+
840+Testing
841+=======
842+
843+See ``tools/testing/selftests/drivers/net/netdevsim/devlink_trap.sh`` for a
844+test covering the core infrastructure. Test cases should be added for any new
845+functionality.
846+
847+Device drivers should focus their tests on device-specific functionality, such
848+as the triggering of supported packet traps.
849diff --git a/Documentation/networking/devlink/index.rst b/Documentation/networking/devlink/index.rst
850new file mode 100644
851index 0000000..1252c2a
852--- /dev/null
853+++ b/Documentation/networking/devlink/index.rst
854@@ -0,0 +1,14 @@
855+Linux Devlink Documentation
856+===========================
857+
858+devlink is an API to expose device information and resources not directly
859+related to any device class, such as chip-wide/switch-ASIC-wide configuration.
860+
861+Contents:
862+
863+.. toctree::
864+ :maxdepth: 1
865+
866+ devlink-info-versions
867+ devlink-trap
868+ devlink-trap-netdevsim
869diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
870index d4dca42..ad58761 100644
871--- a/Documentation/networking/index.rst
872+++ b/Documentation/networking/index.rst
873@@ -13,9 +13,8 @@ Contents:
874 can_ucan_protocol
875 device_drivers/index
876 dsa/index
877- devlink-info-versions
878- devlink-trap
879- devlink-trap-netdevsim
880+ devlink/index
881+ ethtool-netlink
882 ieee802154
883 j1939
884 kapi
885diff --git a/MAINTAINERS b/MAINTAINERS
886index 349639d..d8fd753 100644
887--- a/MAINTAINERS
888+++ b/MAINTAINERS
889@@ -4836,6 +4836,7 @@ S: Supported
890 F: net/core/devlink.c
891 F: include/net/devlink.h
892 F: include/uapi/linux/devlink.h
893+F: Documentation/networking/devlink
894
895 DIALOG SEMICONDUCTOR DRIVERS
896 M: Support Opensource <support.opensource@diasemi.com>
897diff --git a/debian.bluefield/config/config.common.ubuntu b/debian.bluefield/config/config.common.ubuntu
898index f1fcfae..86c5a87 100644
899--- a/debian.bluefield/config/config.common.ubuntu
900+++ b/debian.bluefield/config/config.common.ubuntu
901@@ -2419,26 +2419,7 @@ CONFIG_MISC_FILESYSTEMS=y
902 # CONFIG_MISC_RTSX_USB is not set
903 # CONFIG_MLX4_EN is not set
904 # CONFIG_MLX4_INFINIBAND is not set
905-CONFIG_MLX5_ACCEL=y
906-CONFIG_MLX5_CORE=m
907-CONFIG_MLX5_CORE_EN=y
908-CONFIG_MLX5_CORE_EN_DCB=y
909-CONFIG_MLX5_CORE_IPOIB=y
910-CONFIG_MLX5_EN_ARFS=y
911-CONFIG_MLX5_EN_IPSEC=y
912-CONFIG_MLX5_EN_RXNFC=y
913-CONFIG_MLX5_EN_TLS=y
914-CONFIG_MLX5_ESWITCH=y
915-CONFIG_MLX5_FPGA=y
916-CONFIG_MLX5_FPGA_IPSEC=y
917-CONFIG_MLX5_FPGA_TLS=y
918-CONFIG_MLX5_INFINIBAND=m
919-CONFIG_MLX5_IPSEC=y
920-CONFIG_MLX5_MDEV=y
921-CONFIG_MLX5_MPFS=y
922-CONFIG_MLX5_SW_STEERING=y
923-CONFIG_MLX5_TC_CT=y
924-CONFIG_MLX5_TLS=y
925+# CONFIG_MLX5_CORE is not set
926 # CONFIG_MLX90614 is not set
927 # CONFIG_MLX90632 is not set
928 CONFIG_MLXBF_BOOTCTL=m
929@@ -2448,7 +2429,7 @@ CONFIG_MLXBF_PKA=m
930 CONFIG_MLXBF_PMC=m
931 CONFIG_MLXBF_TMFIFO=y
932 CONFIG_MLXBF_TRIO=m
933-CONFIG_MLXFW=m
934+# CONFIG_MLXFW is not set
935 CONFIG_MLXREG_HOTPLUG=m
936 CONFIG_MLXREG_IO=m
937 # CONFIG_MLXSW_CORE is not set
938@@ -2776,7 +2757,7 @@ CONFIG_NET_CLS_TCINDEX=m
939 CONFIG_NET_CLS_U32=m
940 CONFIG_NET_CORE=y
941 CONFIG_NET_DEVLINK=y
942-CONFIG_NET_DROP_MONITOR=y
943+# CONFIG_NET_DROP_MONITOR is not set
944 # CONFIG_NET_DSA is not set
945 CONFIG_NET_EGRESS=y
946 CONFIG_NET_EMATCH=y
947@@ -2867,7 +2848,11 @@ CONFIG_NET_UDP_TUNNEL=m
948 # CONFIG_NET_VENDOR_ARC is not set
949 # CONFIG_NET_VENDOR_ATHEROS is not set
950 # CONFIG_NET_VENDOR_AURORA is not set
951-# CONFIG_NET_VENDOR_BROADCOM is not set
952+CONFIG_BNXT=m
953+# CONFIG_BNXT_SRIOV is not set
954+# CONFIG_BNXT_FLOWER_OFFLOAD is not set
955+# CONFIG_BNXT_DCB is not set
956+# CONFIG_BNXT_HWMON is not set
957 # CONFIG_NET_VENDOR_BROCADE is not set
958 # CONFIG_NET_VENDOR_CADENCE is not set
959 # CONFIG_NET_VENDOR_CAVIUM is not set
960diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
961index 2d817ba..58b69b4 100644
962--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
963+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
964@@ -16,7 +16,8 @@
965 #include "bnxt_devlink.h"
966
967 static int bnxt_fw_reporter_diagnose(struct devlink_health_reporter *reporter,
968- struct devlink_fmsg *fmsg)
969+ struct devlink_fmsg *fmsg,
970+ struct netlink_ext_ack *extack)
971 {
972 struct bnxt *bp = devlink_health_reporter_priv(reporter);
973 u32 val, health_status;
974@@ -60,7 +61,8 @@ static const struct devlink_health_reporter_ops bnxt_dl_fw_reporter_ops = {
975 };
976
977 static int bnxt_fw_reset_recover(struct devlink_health_reporter *reporter,
978- void *priv_ctx)
979+ void *priv_ctx,
980+ struct netlink_ext_ack *extack)
981 {
982 struct bnxt *bp = devlink_health_reporter_priv(reporter);
983
984@@ -78,7 +80,8 @@ struct devlink_health_reporter_ops bnxt_dl_fw_reset_reporter_ops = {
985 };
986
987 static int bnxt_fw_fatal_recover(struct devlink_health_reporter *reporter,
988- void *priv_ctx)
989+ void *priv_ctx,
990+ struct netlink_ext_ack *extack)
991 {
992 struct bnxt *bp = devlink_health_reporter_priv(reporter);
993 struct bnxt_fw_reporter_ctx *fw_reporter_ctx = priv_ctx;
994@@ -115,7 +118,7 @@ void bnxt_dl_fw_reporters_create(struct bnxt *bp)
995 health->fw_reset_reporter =
996 devlink_health_reporter_create(bp->dl,
997 &bnxt_dl_fw_reset_reporter_ops,
998- 0, true, bp);
999+ 0, bp);
1000 if (IS_ERR(health->fw_reset_reporter)) {
1001 netdev_warn(bp->dev, "Failed to create FW fatal health reporter, rc = %ld\n",
1002 PTR_ERR(health->fw_reset_reporter));
1003@@ -131,7 +134,7 @@ err_recovery:
1004 health->fw_reporter =
1005 devlink_health_reporter_create(bp->dl,
1006 &bnxt_dl_fw_reporter_ops,
1007- 0, false, bp);
1008+ 0, bp);
1009 if (IS_ERR(health->fw_reporter)) {
1010 netdev_warn(bp->dev, "Failed to create FW health reporter, rc = %ld\n",
1011 PTR_ERR(health->fw_reporter));
1012@@ -147,7 +150,7 @@ err_recovery:
1013 health->fw_fatal_reporter =
1014 devlink_health_reporter_create(bp->dl,
1015 &bnxt_dl_fw_fatal_reporter_ops,
1016- 0, true, bp);
1017+ 0, bp);
1018 if (IS_ERR(health->fw_fatal_reporter)) {
1019 netdev_warn(bp->dev, "Failed to create FW fatal health reporter, rc = %ld\n",
1020 PTR_ERR(health->fw_fatal_reporter));
1021@@ -433,6 +436,7 @@ static const struct devlink_param bnxt_dl_port_params[] = {
1022
1023 int bnxt_dl_register(struct bnxt *bp)
1024 {
1025+ struct devlink_port_attrs attrs = {};
1026 struct devlink *dl;
1027 int rc;
1028
1029@@ -466,17 +470,11 @@ int bnxt_dl_register(struct bnxt *bp)
1030 if (!BNXT_PF(bp))
1031 return 0;
1032
1033- rc = devlink_params_register(dl, bnxt_dl_params,
1034- ARRAY_SIZE(bnxt_dl_params));
1035- if (rc) {
1036- netdev_warn(bp->dev, "devlink_params_register failed. rc=%d",
1037- rc);
1038- goto err_dl_unreg;
1039- }
1040-
1041- devlink_port_attrs_set(&bp->dl_port, DEVLINK_PORT_FLAVOUR_PHYSICAL,
1042- bp->pf.port_id, false, 0,
1043- bp->switch_id, sizeof(bp->switch_id));
1044+ attrs.flavour = DEVLINK_PORT_FLAVOUR_PHYSICAL;
1045+ attrs.phys.port_number = bp->pf.port_id;
1046+ memcpy(attrs.switch_id.id, bp->switch_id, sizeof(bp->switch_id));
1047+ attrs.switch_id.id_len = sizeof(bp->switch_id);
1048+ devlink_port_attrs_set(&bp->dl_port, &attrs);
1049 rc = devlink_port_register(dl, &bp->dl_port, bp->pf.port_id);
1050 if (rc) {
1051 netdev_err(bp->dev, "devlink_port_register failed");
1052diff --git a/drivers/net/ethernet/mellanox/mlx4/crdump.c b/drivers/net/ethernet/mellanox/mlx4/crdump.c
1053index eaf08f7..2700628 100644
1054--- a/drivers/net/ethernet/mellanox/mlx4/crdump.c
1055+++ b/drivers/net/ethernet/mellanox/mlx4/crdump.c
1056@@ -38,8 +38,18 @@
1057 #define CR_ENABLE_BIT_OFFSET 0xF3F04
1058 #define MAX_NUM_OF_DUMPS_TO_STORE (8)
1059
1060-static const char *region_cr_space_str = "cr-space";
1061-static const char *region_fw_health_str = "fw-health";
1062+static const char * const region_cr_space_str = "cr-space";
1063+static const char * const region_fw_health_str = "fw-health";
1064+
1065+static const struct devlink_region_ops region_cr_space_ops = {
1066+ .name = region_cr_space_str,
1067+ .destructor = &kvfree,
1068+};
1069+
1070+static const struct devlink_region_ops region_fw_health_ops = {
1071+ .name = region_fw_health_str,
1072+ .destructor = &kvfree,
1073+};
1074
1075 /* Set to true in case cr enable bit was set to true before crdump */
1076 static bool crdump_enbale_bit_set;
1077@@ -99,7 +109,7 @@ static void mlx4_crdump_collect_crspace(struct mlx4_dev *dev,
1078 readl(cr_space + offset);
1079
1080 err = devlink_region_snapshot_create(crdump->region_crspace,
1081- crspace_data, id, &kvfree);
1082+ crspace_data, id);
1083 if (err) {
1084 kvfree(crspace_data);
1085 mlx4_warn(dev, "crdump: devlink create %s snapshot id %d err %d\n",
1086@@ -138,7 +148,7 @@ static void mlx4_crdump_collect_fw_health(struct mlx4_dev *dev,
1087 readl(health_buf_start + offset);
1088
1089 err = devlink_region_snapshot_create(crdump->region_fw_health,
1090- health_data, id, &kvfree);
1091+ health_data, id);
1092 if (err) {
1093 kvfree(health_data);
1094 mlx4_warn(dev, "crdump: devlink create %s snapshot id %d err %d\n",
1095@@ -159,6 +169,7 @@ int mlx4_crdump_collect(struct mlx4_dev *dev)
1096 struct pci_dev *pdev = dev->persist->pdev;
1097 unsigned long cr_res_size;
1098 u8 __iomem *cr_space;
1099+ int err;
1100 u32 id;
1101
1102 if (!dev->caps.health_buffer_addrs) {
1103@@ -179,15 +190,22 @@ int mlx4_crdump_collect(struct mlx4_dev *dev)
1104 return -ENODEV;
1105 }
1106
1107- crdump_enable_crspace_access(dev, cr_space);
1108-
1109 /* Get the available snapshot ID for the dumps */
1110- id = devlink_region_shapshot_id_get(devlink);
1111+ err = devlink_region_snapshot_id_get(devlink, &id);
1112+ if (err) {
1113+ mlx4_err(dev, "crdump: devlink get snapshot id err %d\n", err);
1114+ return err;
1115+ }
1116+
1117+ crdump_enable_crspace_access(dev, cr_space);
1118
1119 /* Try to capture dumps */
1120 mlx4_crdump_collect_crspace(dev, cr_space, id);
1121 mlx4_crdump_collect_fw_health(dev, cr_space, id);
1122
1123+ /* Release reference on the snapshot id */
1124+ devlink_region_snapshot_id_put(devlink, id);
1125+
1126 crdump_disable_crspace_access(dev, cr_space);
1127
1128 iounmap(cr_space);
1129@@ -205,7 +223,7 @@ int mlx4_crdump_init(struct mlx4_dev *dev)
1130 /* Create cr-space region */
1131 crdump->region_crspace =
1132 devlink_region_create(devlink,
1133- region_cr_space_str,
1134+ &region_cr_space_ops,
1135 MAX_NUM_OF_DUMPS_TO_STORE,
1136 pci_resource_len(pdev, 0));
1137 if (IS_ERR(crdump->region_crspace))
1138@@ -216,7 +234,7 @@ int mlx4_crdump_init(struct mlx4_dev *dev)
1139 /* Create fw-health region */
1140 crdump->region_fw_health =
1141 devlink_region_create(devlink,
1142- region_fw_health_str,
1143+ &region_fw_health_ops,
1144 MAX_NUM_OF_DUMPS_TO_STORE,
1145 HEALTH_BUFFER_SIZE);
1146 if (IS_ERR(crdump->region_fw_health))
1147diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
1148index 35882d6..c079910 100644
1149--- a/drivers/net/ethernet/mellanox/mlx4/main.c
1150+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
1151@@ -3943,20 +3943,27 @@ static void mlx4_restart_one_down(struct pci_dev *pdev);
1152 static int mlx4_restart_one_up(struct pci_dev *pdev, bool reload,
1153 struct devlink *devlink);
1154
1155-static int mlx4_devlink_reload_down(struct devlink *devlink,
1156+static int mlx4_devlink_reload_down(struct devlink *devlink, bool netns_change,
1157+ enum devlink_reload_action action,
1158+ enum devlink_reload_limit limit,
1159 struct netlink_ext_ack *extack)
1160 {
1161 struct mlx4_priv *priv = devlink_priv(devlink);
1162 struct mlx4_dev *dev = &priv->dev;
1163 struct mlx4_dev_persistent *persist = dev->persist;
1164
1165+ if (netns_change) {
1166+ NL_SET_ERR_MSG_MOD(extack, "Namespace change is not supported");
1167+ return -EOPNOTSUPP;
1168+ }
1169 if (persist->num_vfs)
1170 mlx4_warn(persist->dev, "Reload performed on PF, will cause reset on operating Virtual Functions\n");
1171 mlx4_restart_one_down(persist->pdev);
1172 return 0;
1173 }
1174
1175-static int mlx4_devlink_reload_up(struct devlink *devlink,
1176+static int mlx4_devlink_reload_up(struct devlink *devlink, enum devlink_reload_action action,
1177+ enum devlink_reload_limit limit, u32 *actions_performed,
1178 struct netlink_ext_ack *extack)
1179 {
1180 struct mlx4_priv *priv = devlink_priv(devlink);
1181@@ -3964,6 +3971,7 @@ static int mlx4_devlink_reload_up(struct devlink *devlink,
1182 struct mlx4_dev_persistent *persist = dev->persist;
1183 int err;
1184
1185+ *actions_performed = BIT(DEVLINK_RELOAD_ACTION_DRIVER_REINIT);
1186 err = mlx4_restart_one_up(persist->pdev, true, devlink);
1187 if (err)
1188 mlx4_err(persist->dev, "mlx4_restart_one_up failed, ret=%d\n",
1189@@ -3974,6 +3982,7 @@ static int mlx4_devlink_reload_up(struct devlink *devlink,
1190
1191 static const struct devlink_ops mlx4_devlink_ops = {
1192 .port_type_set = mlx4_devlink_port_type_set,
1193+ .reload_actions = BIT(DEVLINK_RELOAD_ACTION_DRIVER_REINIT),
1194 .reload_down = mlx4_devlink_reload_down,
1195 .reload_up = mlx4_devlink_reload_up,
1196 };
1197diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
1198index 9fa4b98..75afcab 100644
1199--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
1200+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
1201@@ -222,7 +222,8 @@ static int mlx5e_rx_reporter_recover_from_ctx(struct mlx5e_err_ctx *err_ctx)
1202 }
1203
1204 static int mlx5e_rx_reporter_recover(struct devlink_health_reporter *reporter,
1205- void *context)
1206+ void *context,
1207+ struct netlink_ext_ack *extack)
1208 {
1209 struct mlx5e_priv *priv = devlink_health_reporter_priv(reporter);
1210 struct mlx5e_err_ctx *err_ctx = context;
1211@@ -301,7 +302,8 @@ static int mlx5e_rx_reporter_build_diagnose_output(struct mlx5e_rq *rq,
1212 }
1213
1214 static int mlx5e_rx_reporter_diagnose(struct devlink_health_reporter *reporter,
1215- struct devlink_fmsg *fmsg)
1216+ struct devlink_fmsg *fmsg,
1217+ struct netlink_ext_ack *extack)
1218 {
1219 struct mlx5e_priv *priv = devlink_health_reporter_priv(reporter);
1220 struct mlx5e_params *params = &priv->channels.params;
1221@@ -385,7 +387,7 @@ int mlx5e_reporter_rx_create(struct mlx5e_priv *priv)
1222 reporter = devlink_health_reporter_create(devlink,
1223 &mlx5_rx_reporter_ops,
1224 MLX5E_REPORTER_RX_GRACEFUL_PERIOD,
1225- true, priv);
1226+ priv);
1227 if (IS_ERR(reporter)) {
1228 netdev_warn(priv->netdev, "Failed to create rx reporter, err = %ld\n",
1229 PTR_ERR(reporter));
1230diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
1231index bfed558..8b1089f 100644
1232--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
1233+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
1234@@ -135,7 +135,8 @@ static int mlx5e_tx_reporter_recover_from_ctx(struct mlx5e_err_ctx *err_ctx)
1235 }
1236
1237 static int mlx5e_tx_reporter_recover(struct devlink_health_reporter *reporter,
1238- void *context)
1239+ void *context,
1240+ struct netlink_ext_ack *extack)
1241 {
1242 struct mlx5e_priv *priv = devlink_health_reporter_priv(reporter);
1243 struct mlx5e_err_ctx *err_ctx = context;
1244@@ -205,7 +206,8 @@ mlx5e_tx_reporter_build_diagnose_output(struct devlink_fmsg *fmsg,
1245 }
1246
1247 static int mlx5e_tx_reporter_diagnose(struct devlink_health_reporter *reporter,
1248- struct devlink_fmsg *fmsg)
1249+ struct devlink_fmsg *fmsg,
1250+ struct netlink_ext_ack *extack)
1251 {
1252 struct mlx5e_priv *priv = devlink_health_reporter_priv(reporter);
1253 struct mlx5e_txqsq *generic_sq = priv->txq2sq[0];
1254@@ -291,7 +293,7 @@ int mlx5e_reporter_tx_create(struct mlx5e_priv *priv)
1255 reporter =
1256 devlink_health_reporter_create(devlink, &mlx5_tx_reporter_ops,
1257 MLX5_REPORTER_TX_GRACEFUL_PERIOD,
1258- true, priv);
1259+ priv);
1260 if (IS_ERR(reporter)) {
1261 netdev_warn(priv->netdev,
1262 "Failed to create tx reporter, err = %ld\n",
1263diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
1264index f628887..780ae27 100644
1265--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
1266+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
1267@@ -398,7 +398,8 @@ static void print_health_info(struct mlx5_core_dev *dev)
1268
1269 static int
1270 mlx5_fw_reporter_diagnose(struct devlink_health_reporter *reporter,
1271- struct devlink_fmsg *fmsg)
1272+ struct devlink_fmsg *fmsg,
1273+ struct netlink_ext_ack *extack)
1274 {
1275 struct mlx5_core_dev *dev = devlink_health_reporter_priv(reporter);
1276 struct mlx5_core_health *health = &dev->priv.health;
1277@@ -499,7 +500,8 @@ mlx5_fw_reporter_heath_buffer_data_put(struct mlx5_core_dev *dev,
1278
1279 static int
1280 mlx5_fw_reporter_dump(struct devlink_health_reporter *reporter,
1281- struct devlink_fmsg *fmsg, void *priv_ctx)
1282+ struct devlink_fmsg *fmsg, void *priv_ctx,
1283+ struct netlink_ext_ack *extack)
1284 {
1285 struct mlx5_core_dev *dev = devlink_health_reporter_priv(reporter);
1286 int err;
1287@@ -553,7 +555,8 @@ static const struct devlink_health_reporter_ops mlx5_fw_reporter_ops = {
1288
1289 static int
1290 mlx5_fw_fatal_reporter_recover(struct devlink_health_reporter *reporter,
1291- void *priv_ctx)
1292+ void *priv_ctx,
1293+ struct netlink_ext_ack *extack)
1294 {
1295 struct mlx5_core_dev *dev = devlink_health_reporter_priv(reporter);
1296
1297@@ -563,7 +566,8 @@ mlx5_fw_fatal_reporter_recover(struct devlink_health_reporter *reporter,
1298 #define MLX5_CR_DUMP_CHUNK_SIZE 256
1299 static int
1300 mlx5_fw_fatal_reporter_dump(struct devlink_health_reporter *reporter,
1301- struct devlink_fmsg *fmsg, void *priv_ctx)
1302+ struct devlink_fmsg *fmsg, void *priv_ctx,
1303+ struct netlink_ext_ack *extack)
1304 {
1305 struct mlx5_core_dev *dev = devlink_health_reporter_priv(reporter);
1306 u32 crdump_size = dev->priv.health.crdump_size;
1307@@ -647,7 +651,7 @@ static void mlx5_fw_reporters_create(struct mlx5_core_dev *dev)
1308
1309 health->fw_reporter =
1310 devlink_health_reporter_create(devlink, &mlx5_fw_reporter_ops,
1311- 0, false, dev);
1312+ 0, dev);
1313 if (IS_ERR(health->fw_reporter))
1314 mlx5_core_warn(dev, "Failed to create fw reporter, err = %ld\n",
1315 PTR_ERR(health->fw_reporter));
1316@@ -656,7 +660,7 @@ static void mlx5_fw_reporters_create(struct mlx5_core_dev *dev)
1317 devlink_health_reporter_create(devlink,
1318 &mlx5_fw_fatal_reporter_ops,
1319 MLX5_REPORTER_FW_GRACEFUL_PERIOD,
1320- true, dev);
1321+ dev);
1322 if (IS_ERR(health->fw_fatal_reporter))
1323 mlx5_core_warn(dev, "Failed to create fw fatal reporter, err = %ld\n",
1324 PTR_ERR(health->fw_fatal_reporter));
1325diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
1326index 2d39bad..dff8d24 100644
1327--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
1328+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
1329@@ -990,6 +990,8 @@ mlxsw_devlink_info_get(struct devlink *devlink, struct devlink_info_req *req,
1330
1331 static int
1332 mlxsw_devlink_core_bus_device_reload_down(struct devlink *devlink,
1333+ bool netns_change, enum devlink_reload_action action,
1334+ enum devlink_reload_limit limit,
1335 struct netlink_ext_ack *extack)
1336 {
1337 struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
1338@@ -1002,11 +1004,14 @@ mlxsw_devlink_core_bus_device_reload_down(struct devlink *devlink,
1339 }
1340
1341 static int
1342-mlxsw_devlink_core_bus_device_reload_up(struct devlink *devlink,
1343+mlxsw_devlink_core_bus_device_reload_up(struct devlink *devlink, enum devlink_reload_action action,
1344+ enum devlink_reload_limit limit, u32 *actions_performed,
1345 struct netlink_ext_ack *extack)
1346 {
1347 struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
1348
1349+ *actions_performed = BIT(DEVLINK_RELOAD_ACTION_DRIVER_REINIT) |
1350+ BIT(DEVLINK_RELOAD_ACTION_FW_ACTIVATE);
1351 return mlxsw_core_bus_device_register(mlxsw_core->bus_info,
1352 mlxsw_core->bus,
1353 mlxsw_core->bus_priv, true,
1354@@ -1076,6 +1081,8 @@ mlxsw_devlink_trap_group_init(struct devlink *devlink,
1355 }
1356
1357 static const struct devlink_ops mlxsw_devlink_ops = {
1358+ .reload_actions = BIT(DEVLINK_RELOAD_ACTION_DRIVER_REINIT) |
1359+ BIT(DEVLINK_RELOAD_ACTION_FW_ACTIVATE),
1360 .reload_down = mlxsw_devlink_core_bus_device_reload_down,
1361 .reload_up = mlxsw_devlink_core_bus_device_reload_up,
1362 .port_type_set = mlxsw_devlink_port_type_set,
1363@@ -1889,12 +1896,19 @@ static int __mlxsw_core_port_init(struct mlxsw_core *mlxsw_core, u8 local_port,
1364 struct mlxsw_core_port *mlxsw_core_port =
1365 &mlxsw_core->ports[local_port];
1366 struct devlink_port *devlink_port = &mlxsw_core_port->devlink_port;
1367+ struct devlink_port_attrs attrs = {};
1368 int err;
1369
1370+ attrs.split = split;
1371+ attrs.lanes = lanes;
1372+ attrs.splittable = splittable;
1373+ attrs.flavour = flavour;
1374+ attrs.phys.port_number = port_number;
1375+ attrs.phys.split_subport_number = split_port_subnumber;
1376+ memcpy(attrs.switch_id.id, switch_id, switch_id_len);
1377+ attrs.switch_id.id_len = switch_id_len;
1378 mlxsw_core_port->local_port = local_port;
1379- devlink_port_attrs_set(devlink_port, flavour, port_number,
1380- split, split_port_subnumber,
1381- switch_id, switch_id_len);
1382+ devlink_port_attrs_set(devlink_port, &attrs);
1383 err = devlink_port_register(devlink, devlink_port, local_port);
1384 if (err)
1385 memset(mlxsw_core_port, 0, sizeof(*mlxsw_core_port));
1386diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
1387index 5d7d2ab..30a718a 100644
1388--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
1389+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
1390@@ -18,6 +18,11 @@
1391 #include "cmd.h"
1392 #include "resources.h"
1393
1394+enum mlxsw_core_resource_id {
1395+ MLXSW_CORE_RESOURCE_PORTS = 1,
1396+ MLXSW_CORE_RESOURCE_MAX,
1397+};
1398+
1399 struct mlxsw_core;
1400 struct mlxsw_core_port;
1401 struct mlxsw_driver;
1402diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
1403index b2a0028..9f7107e 100644
1404--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
1405+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
1406@@ -48,7 +48,7 @@
1407 #define MLXSW_SP_RESOURCE_NAME_KVD_LINEAR_LARGE_CHUNKS "large_chunks"
1408
1409 enum mlxsw_sp_resource_id {
1410- MLXSW_SP_RESOURCE_KVD = 1,
1411+ MLXSW_SP_RESOURCE_KVD = MLXSW_CORE_RESOURCE_MAX,
1412 MLXSW_SP_RESOURCE_KVD_LINEAR,
1413 MLXSW_SP_RESOURCE_KVD_HASH_SINGLE,
1414 MLXSW_SP_RESOURCE_KVD_HASH_DOUBLE,
1415diff --git a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
1416index c50fce4..b6a1056 100644
1417--- a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
1418+++ b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
1419@@ -211,7 +211,7 @@ static const struct nfp_devlink_versions {
1420 enum nfp_nsp_versions id;
1421 const char *key;
1422 } nfp_devlink_versions_nsp[] = {
1423- { NFP_VERSIONS_BUNDLE, "fw.bundle_id", },
1424+ { NFP_VERSIONS_BUNDLE, DEVLINK_INFO_VERSION_GENERIC_FW_BUNDLE_ID, },
1425 { NFP_VERSIONS_BSP, DEVLINK_INFO_VERSION_GENERIC_FW_MGMT, },
1426 { NFP_VERSIONS_CPLD, "fw.cpld", },
1427 { NFP_VERSIONS_APP, DEVLINK_INFO_VERSION_GENERIC_FW_APP, },
1428@@ -353,6 +353,7 @@ const struct devlink_ops nfp_devlink_ops = {
1429
1430 int nfp_devlink_port_register(struct nfp_app *app, struct nfp_port *port)
1431 {
1432+ struct devlink_port_attrs attrs = {};
1433 struct nfp_eth_table_port eth_port;
1434 struct devlink *devlink;
1435 const u8 *serial;
1436@@ -365,10 +366,15 @@ int nfp_devlink_port_register(struct nfp_app *app, struct nfp_port *port)
1437 if (ret)
1438 return ret;
1439
1440+ attrs.split = eth_port.is_split;
1441+ attrs.splittable = !attrs.split;
1442+ attrs.flavour = DEVLINK_PORT_FLAVOUR_PHYSICAL;
1443+ attrs.phys.port_number = eth_port.label_port;
1444+ attrs.phys.split_subport_number = eth_port.label_subport;
1445 serial_len = nfp_cpp_serial(port->app->cpp, &serial);
1446- devlink_port_attrs_set(&port->dl_port, DEVLINK_PORT_FLAVOUR_PHYSICAL,
1447- eth_port.label_port, eth_port.is_split,
1448- eth_port.label_subport, serial, serial_len);
1449+ memcpy(attrs.switch_id.id, serial, serial_len);
1450+ attrs.switch_id.id_len = serial_len;
1451+ devlink_port_attrs_set(&port->dl_port, &attrs);
1452
1453 devlink = priv_to_devlink(app->pf);
1454
1455diff --git a/drivers/net/ethernet/pensando/ionic/ionic_devlink.c b/drivers/net/ethernet/pensando/ionic/ionic_devlink.c
1456index af1647a..6a6feb0 100644
1457--- a/drivers/net/ethernet/pensando/ionic/ionic_devlink.c
1458+++ b/drivers/net/ethernet/pensando/ionic/ionic_devlink.c
1459@@ -70,6 +70,7 @@ void ionic_devlink_free(struct ionic *ionic)
1460 int ionic_devlink_register(struct ionic *ionic)
1461 {
1462 struct devlink *dl = priv_to_devlink(ionic);
1463+ struct devlink_port_attrs attrs = {};
1464 int err;
1465
1466 err = devlink_register(dl, ionic->dev);
1467@@ -78,8 +79,8 @@ int ionic_devlink_register(struct ionic *ionic)
1468 return err;
1469 }
1470
1471- devlink_port_attrs_set(&ionic->dl_port, DEVLINK_PORT_FLAVOUR_PHYSICAL,
1472- 0, false, 0, NULL, 0);
1473+ attrs.flavour = DEVLINK_PORT_FLAVOUR_PHYSICAL;
1474+ devlink_port_attrs_set(&ionic->dl_port, &attrs);
1475 err = devlink_port_register(dl, &ionic->dl_port, 0);
1476 if (err)
1477 dev_err(ionic->dev, "devlink_port_register failed: %d\n", err);
1478diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
1479index 91b302f..218b4bc 100644
1480--- a/drivers/net/netdevsim/dev.c
1481+++ b/drivers/net/netdevsim/dev.c
1482@@ -43,19 +43,27 @@ static ssize_t nsim_dev_take_snapshot_write(struct file *file,
1483 size_t count, loff_t *ppos)
1484 {
1485 struct nsim_dev *nsim_dev = file->private_data;
1486+ struct devlink *devlink;
1487 void *dummy_data;
1488 int err;
1489 u32 id;
1490
1491+ devlink = priv_to_devlink(nsim_dev);
1492+
1493 dummy_data = kmalloc(NSIM_DEV_DUMMY_REGION_SIZE, GFP_KERNEL);
1494 if (!dummy_data)
1495 return -ENOMEM;
1496
1497 get_random_bytes(dummy_data, NSIM_DEV_DUMMY_REGION_SIZE);
1498
1499- id = devlink_region_shapshot_id_get(priv_to_devlink(nsim_dev));
1500+ err = devlink_region_snapshot_id_get(devlink, &id);
1501+ if (err) {
1502+ pr_err("Failed to get snapshot id\n");
1503+ return err;
1504+ }
1505 err = devlink_region_snapshot_create(nsim_dev->dummy_region,
1506- dummy_data, id, kfree);
1507+ dummy_data, id);
1508+ devlink_region_snapshot_id_put(devlink, id);
1509 if (err) {
1510 pr_err("Failed to create region snapshot\n");
1511 kfree(dummy_data);
1512@@ -293,11 +301,16 @@ static void nsim_devlink_param_load_driverinit_values(struct devlink *devlink)
1513
1514 #define NSIM_DEV_DUMMY_REGION_SNAPSHOT_MAX 16
1515
1516+static const struct devlink_region_ops dummy_region_ops = {
1517+ .name = "dummy",
1518+ .destructor = &kfree,
1519+};
1520+
1521 static int nsim_dev_dummy_region_init(struct nsim_dev *nsim_dev,
1522 struct devlink *devlink)
1523 {
1524 nsim_dev->dummy_region =
1525- devlink_region_create(devlink, "dummy",
1526+ devlink_region_create(devlink, &dummy_region_ops,
1527 NSIM_DEV_DUMMY_REGION_SNAPSHOT_MAX,
1528 NSIM_DEV_DUMMY_REGION_SIZE);
1529 return PTR_ERR_OR_ZERO(nsim_dev->dummy_region);
1530@@ -321,7 +334,7 @@ struct nsim_trap_data {
1531 };
1532
1533 /* All driver-specific traps must be documented in
1534- * Documentation/networking/devlink-trap-netdevsim.rst
1535+ * Documentation/networking/devlink/devlink-trap-netdevsim.rst
1536 */
1537 enum {
1538 NSIM_TRAP_ID_BASE = DEVLINK_TRAP_GENERIC_ID_MAX,
1539@@ -433,7 +446,7 @@ static void nsim_dev_trap_report(struct nsim_dev_port *nsim_dev_port)
1540 */
1541 local_bh_disable();
1542 devlink_trap_report(devlink, skb, nsim_trap_item->trap_ctx,
1543- &nsim_dev_port->devlink_port);
1544+ &nsim_dev_port->devlink_port, NULL);
1545 local_bh_enable();
1546 consume_skb(skb);
1547 }
1548@@ -749,6 +762,7 @@ static void nsim_dev_destroy(struct nsim_dev *nsim_dev)
1549 static int __nsim_dev_port_add(struct nsim_dev *nsim_dev,
1550 unsigned int port_index)
1551 {
1552+ struct devlink_port_attrs attrs = {};
1553 struct nsim_dev_port *nsim_dev_port;
1554 struct devlink_port *devlink_port;
1555 int err;
1556@@ -759,10 +773,11 @@ static int __nsim_dev_port_add(struct nsim_dev *nsim_dev,
1557 nsim_dev_port->port_index = port_index;
1558
1559 devlink_port = &nsim_dev_port->devlink_port;
1560- devlink_port_attrs_set(devlink_port, DEVLINK_PORT_FLAVOUR_PHYSICAL,
1561- port_index + 1, 0, 0,
1562- nsim_dev->switch_id.id,
1563- nsim_dev->switch_id.id_len);
1564+ attrs.flavour = DEVLINK_PORT_FLAVOUR_PHYSICAL;
1565+ attrs.phys.port_number = port_index + 1;
1566+ memcpy(attrs.switch_id.id, nsim_dev->switch_id.id, nsim_dev->switch_id.id_len);
1567+ attrs.switch_id.id_len = nsim_dev->switch_id.id_len;
1568+ devlink_port_attrs_set(devlink_port, &attrs);
1569 err = devlink_port_register(priv_to_devlink(nsim_dev), devlink_port,
1570 port_index);
1571 if (err)
1572diff --git a/include/net/devlink.h b/include/net/devlink.h
1573index ac3ad16..08f4c61 100644
1574--- a/include/net/devlink.h
1575+++ b/include/net/devlink.h
1576@@ -16,28 +16,44 @@
1577 #include <linux/workqueue.h>
1578 #include <linux/refcount.h>
1579 #include <net/net_namespace.h>
1580+#include <net/flow_offload.h>
1581 #include <uapi/linux/devlink.h>
1582+#include <linux/xarray.h>
1583+#include <linux/firmware.h>
1584+
1585+#define DEVLINK_RELOAD_STATS_ARRAY_SIZE \
1586+ (__DEVLINK_RELOAD_LIMIT_MAX * __DEVLINK_RELOAD_ACTION_MAX)
1587+
1588+struct devlink_dev_stats {
1589+ u32 reload_stats[DEVLINK_RELOAD_STATS_ARRAY_SIZE];
1590+ u32 remote_reload_stats[DEVLINK_RELOAD_STATS_ARRAY_SIZE];
1591+};
1592
1593 struct devlink_ops;
1594
1595 struct devlink {
1596 struct list_head list;
1597 struct list_head port_list;
1598+ struct list_head rate_list;
1599 struct list_head sb_list;
1600 struct list_head dpipe_table_list;
1601 struct list_head resource_list;
1602 struct list_head param_list;
1603 struct list_head region_list;
1604- u32 snapshot_id;
1605 struct list_head reporter_list;
1606 struct mutex reporters_lock; /* protects reporter_list */
1607 struct devlink_dpipe_headers *dpipe_headers;
1608 struct list_head trap_list;
1609 struct list_head trap_group_list;
1610+ struct list_head trap_policer_list;
1611 const struct devlink_ops *ops;
1612+ struct xarray snapshot_ids;
1613+ struct devlink_dev_stats stats;
1614 struct device *dev;
1615 possible_net_t _net;
1616- struct mutex lock;
1617+ struct mutex lock; /* Serializes access to devlink instance specific objects such as
1618+ * port, sb, dpipe, resource, params, region, traps and more.
1619+ */
1620 u8 reload_failed:1,
1621 reload_enabled:1;
1622 char priv[0] __aligned(NETDEV_ALIGN);
1623@@ -48,37 +64,99 @@ struct devlink_port_phys_attrs {
1624 * A physical port which is visible to the user
1625 * for a given port flavour.
1626 */
1627- u32 split_subport_number;
1628+ u32 split_subport_number; /* If the port is split, this is the number of subport. */
1629 };
1630
1631+/**
1632+ * struct devlink_port_pci_pf_attrs - devlink port's PCI PF attributes
1633+ * @controller: Associated controller number
1634+ * @pf: Associated PCI PF number for this port.
1635+ * @external: when set, indicates if a port is for an external controller
1636+ */
1637 struct devlink_port_pci_pf_attrs {
1638- u16 pf; /* Associated PCI PF for this port. */
1639+ u32 controller;
1640+ u16 pf;
1641+ u8 external:1;
1642 };
1643
1644+/**
1645+ * struct devlink_port_pci_vf_attrs - devlink port's PCI VF attributes
1646+ * @controller: Associated controller number
1647+ * @pf: Associated PCI PF number for this port.
1648+ * @vf: Associated PCI VF for of the PCI PF for this port.
1649+ * @external: when set, indicates if a port is for an external controller
1650+ */
1651 struct devlink_port_pci_vf_attrs {
1652- u16 pf; /* Associated PCI PF for this port. */
1653- u16 vf; /* Associated PCI VF for of the PCI PF for this port. */
1654+ u32 controller;
1655+ u16 pf;
1656+ u16 vf;
1657+ u8 external:1;
1658+};
1659+
1660+/**
1661+ * struct devlink_port_pci_sf_attrs - devlink port's PCI SF attributes
1662+ * @controller: Associated controller number
1663+ * @sf: Associated PCI SF for of the PCI PF for this port.
1664+ * @pf: Associated PCI PF number for this port.
1665+ * @external: when set, indicates if a port is for an external controller
1666+ */
1667+struct devlink_port_pci_sf_attrs {
1668+ u32 controller;
1669+ u32 sf;
1670+ u16 pf;
1671+ u8 external:1;
1672 };
1673
1674+/**
1675+ * struct devlink_port_attrs - devlink port object
1676+ * @flavour: flavour of the port
1677+ * @split: indicates if this is split port
1678+ * @splittable: indicates if the port can be split.
1679+ * @lanes: maximum number of lanes the port supports. 0 value is not passed to netlink.
1680+ * @switch_id: if the port is part of switch, this is buffer with ID, otherwise this is NULL
1681+ * @phys: physical port attributes
1682+ * @pci_pf: PCI PF port attributes
1683+ * @pci_vf: PCI VF port attributes
1684+ * @pci_sf: PCI SF port attributes
1685+ */
1686 struct devlink_port_attrs {
1687- u8 set:1,
1688- split:1,
1689- switch_port:1;
1690+ u8 split:1,
1691+ splittable:1;
1692+ u32 lanes;
1693 enum devlink_port_flavour flavour;
1694 struct netdev_phys_item_id switch_id;
1695 union {
1696 struct devlink_port_phys_attrs phys;
1697 struct devlink_port_pci_pf_attrs pci_pf;
1698 struct devlink_port_pci_vf_attrs pci_vf;
1699+ struct devlink_port_pci_sf_attrs pci_sf;
1700+ };
1701+};
1702+
1703+struct devlink_rate {
1704+ struct list_head list;
1705+ enum devlink_rate_type type;
1706+ struct devlink *devlink;
1707+ void *priv;
1708+ u64 tx_share;
1709+ u64 tx_max;
1710+
1711+ struct devlink_rate *parent;
1712+ union {
1713+ struct devlink_port *devlink_port;
1714+ struct {
1715+ char *name;
1716+ refcount_t refcnt;
1717+ };
1718 };
1719 };
1720
1721 struct devlink_port {
1722 struct list_head list;
1723 struct list_head param_list;
1724+ struct list_head region_list;
1725 struct devlink *devlink;
1726 unsigned int index;
1727- bool registered;
1728 spinlock_t type_lock; /* Protects type and type_dev
1729 * pointer consistency.
1730 */
1731@@ -86,7 +164,24 @@ struct devlink_port {
1732 enum devlink_port_type desired_type;
1733 void *type_dev;
1734 struct devlink_port_attrs attrs;
1735+ u8 attrs_set:1,
1736+ switch_port:1;
1737 struct delayed_work type_warn_dw;
1738+ struct list_head reporter_list;
1739+ struct mutex reporters_lock; /* Protects reporter_list */
1740+
1741+ struct devlink_rate *devlink_rate;
1742+};
1743+
1744+struct devlink_port_new_attrs {
1745+ enum devlink_port_flavour flavour;
1746+ unsigned int port_index;
1747+ u32 controller;
1748+ u32 sfnum;
1749+ u16 pfnum;
1750+ u8 port_index_valid:1,
1751+ controller_valid:1,
1752+ sfnum_valid:1;
1753 };
1754
1755 struct devlink_sb_pool_info {
1756@@ -331,6 +426,8 @@ struct devlink_resource {
1757
1758 #define DEVLINK_RESOURCE_ID_PARENT_TOP 0
1759
1760+#define DEVLINK_RESOURCE_GENERIC_NAME_PORTS "physical_ports"
1761+
1762 #define __DEVLINK_PARAM_MAX_STRING_VALUE 32
1763 enum devlink_param_type {
1764 DEVLINK_PARAM_TYPE_U8,
1765@@ -354,6 +451,25 @@ struct devlink_param_gset_ctx {
1766 };
1767
1768 /**
1769+ * struct devlink_flash_notify - devlink dev flash notify data
1770+ * @status_msg: current status string
1771+ * @component: firmware component being updated
1772+ * @done: amount of work completed of total amount
1773+ * @total: amount of work expected to be done
1774+ * @timeout: expected max timeout in seconds
1775+ *
1776+ * These are values to be given to userland to be displayed in order
1777+ * to show current activity in a firmware update process.
1778+ */
1779+struct devlink_flash_notify {
1780+ const char *status_msg;
1781+ const char *component;
1782+ unsigned long done;
1783+ unsigned long total;
1784+ unsigned long timeout;
1785+};
1786+
1787+/**
1788 * struct devlink_param - devlink configuration parameter data
1789 * @name: name of the parameter
1790 * @generic: indicates if the parameter is generic or driver specific
1791@@ -402,6 +518,7 @@ enum devlink_param_generic_id {
1792 DEVLINK_PARAM_GENERIC_ID_FW_LOAD_POLICY,
1793 DEVLINK_PARAM_GENERIC_ID_RESET_DEV_ON_DRV_PROBE,
1794 DEVLINK_PARAM_GENERIC_ID_ENABLE_ROCE,
1795+ DEVLINK_PARAM_GENERIC_ID_ENABLE_REMOTE_DEV_RESET,
1796
1797 /* add new param generic ids above here*/
1798 __DEVLINK_PARAM_GENERIC_ID_MAX,
1799@@ -439,6 +556,9 @@ enum devlink_param_generic_id {
1800 #define DEVLINK_PARAM_GENERIC_ENABLE_ROCE_NAME "enable_roce"
1801 #define DEVLINK_PARAM_GENERIC_ENABLE_ROCE_TYPE DEVLINK_PARAM_TYPE_BOOL
1802
1803+#define DEVLINK_PARAM_GENERIC_ENABLE_REMOTE_DEV_RESET_NAME "enable_remote_dev_reset"
1804+#define DEVLINK_PARAM_GENERIC_ENABLE_REMOTE_DEV_RESET_TYPE DEVLINK_PARAM_TYPE_BOOL
1805+
1806 #define DEVLINK_PARAM_GENERIC(_id, _cmodes, _get, _set, _validate) \
1807 { \
1808 .id = DEVLINK_PARAM_GENERIC_ID_##_id, \
1809@@ -478,17 +598,81 @@ enum devlink_param_generic_id {
1810 #define DEVLINK_INFO_VERSION_GENERIC_FW "fw"
1811 /* Control processor FW version */
1812 #define DEVLINK_INFO_VERSION_GENERIC_FW_MGMT "fw.mgmt"
1813+/* FW interface specification version */
1814+#define DEVLINK_INFO_VERSION_GENERIC_FW_MGMT_API "fw.mgmt.api"
1815 /* Data path microcode controlling high-speed packet processing */
1816 #define DEVLINK_INFO_VERSION_GENERIC_FW_APP "fw.app"
1817 /* UNDI software version */
1818 #define DEVLINK_INFO_VERSION_GENERIC_FW_UNDI "fw.undi"
1819 /* NCSI support/handler version */
1820 #define DEVLINK_INFO_VERSION_GENERIC_FW_NCSI "fw.ncsi"
1821+/* FW parameter set id */
1822+#define DEVLINK_INFO_VERSION_GENERIC_FW_PSID "fw.psid"
1823+/* RoCE FW version */
1824+#define DEVLINK_INFO_VERSION_GENERIC_FW_ROCE "fw.roce"
1825+/* Firmware bundle identifier */
1826+#define DEVLINK_INFO_VERSION_GENERIC_FW_BUNDLE_ID "fw.bundle_id"
1827+
1828+/**
1829+ * struct devlink_flash_update_params - Flash Update parameters
1830+ * @fw: pointer to the firmware data to update from
1831+ * @component: the flash component to update
1832+ *
1833+ * With the exception of fw, drivers must opt-in to parameters by
1834+ * setting the appropriate bit in the supported_flash_update_params field in
1835+ * their devlink_ops structure.
1836+ */
1837+struct devlink_flash_update_params {
1838+ const struct firmware *fw;
1839+ const char *component;
1840+ u32 overwrite_mask;
1841+};
1842+
1843+#define DEVLINK_SUPPORT_FLASH_UPDATE_COMPONENT BIT(0)
1844+#define DEVLINK_SUPPORT_FLASH_UPDATE_OVERWRITE_MASK BIT(1)
1845
1846 struct devlink_region;
1847 struct devlink_info_req;
1848
1849-typedef void devlink_snapshot_data_dest_t(const void *data);
1850+/**
1851+ * struct devlink_region_ops - Region operations
1852+ * @name: region name
1853+ * @destructor: callback used to free snapshot memory when deleting
1854+ * @snapshot: callback to request an immediate snapshot. On success,
1855+ * the data variable must be updated to point to the snapshot data.
1856+ * The function will be called while the devlink instance lock is
1857+ * held.
1858+ * @priv: Pointer to driver private data for the region operation
1859+ */
1860+struct devlink_region_ops {
1861+ const char *name;
1862+ void (*destructor)(const void *data);
1863+ int (*snapshot)(struct devlink *devlink,
1864+ const struct devlink_region_ops *ops,
1865+ struct netlink_ext_ack *extack,
1866+ u8 **data);
1867+ void *priv;
1868+};
1869+
1870+/**
1871+ * struct devlink_port_region_ops - Region operations for a port
1872+ * @name: region name
1873+ * @destructor: callback used to free snapshot memory when deleting
1874+ * @snapshot: callback to request an immediate snapshot. On success,
1875+ * the data variable must be updated to point to the snapshot data.
1876+ * The function will be called while the devlink instance lock is
1877+ * held.
1878+ * @priv: Pointer to driver private data for the region operation
1879+ */
1880+struct devlink_port_region_ops {
1881+ const char *name;
1882+ void (*destructor)(const void *data);
1883+ int (*snapshot)(struct devlink_port *port,
1884+ const struct devlink_port_region_ops *ops,
1885+ struct netlink_ext_ack *extack,
1886+ u8 **data);
1887+ void *priv;
1888+};
1889
1890 struct devlink_fmsg;
1891 struct devlink_health_reporter;
1892@@ -506,16 +690,60 @@ enum devlink_health_reporter_state {
1893 * @dump: callback to dump an object
1894 * if priv_ctx is NULL, run a full dump
1895 * @diagnose: callback to diagnose the current status
1896+ * @test: callback to trigger a test event
1897 */
1898
1899 struct devlink_health_reporter_ops {
1900 char *name;
1901 int (*recover)(struct devlink_health_reporter *reporter,
1902- void *priv_ctx);
1903+ void *priv_ctx, struct netlink_ext_ack *extack);
1904 int (*dump)(struct devlink_health_reporter *reporter,
1905- struct devlink_fmsg *fmsg, void *priv_ctx);
1906+ struct devlink_fmsg *fmsg, void *priv_ctx,
1907+ struct netlink_ext_ack *extack);
1908 int (*diagnose)(struct devlink_health_reporter *reporter,
1909- struct devlink_fmsg *fmsg);
1910+ struct devlink_fmsg *fmsg,
1911+ struct netlink_ext_ack *extack);
1912+ int (*test)(struct devlink_health_reporter *reporter,
1913+ struct netlink_ext_ack *extack);
1914+};
1915+
1916+/**
1917+ * struct devlink_trap_metadata - Packet trap metadata.
1918+ * @trap_name: Trap name.
1919+ * @trap_group_name: Trap group name.
1920+ * @input_dev: Input netdevice.
1921+ * @fa_cookie: Flow action user cookie.
1922+ * @trap_type: Trap type.
1923+ */
1924+struct devlink_trap_metadata {
1925+ const char *trap_name;
1926+ const char *trap_group_name;
1927+ struct net_device *input_dev;
1928+ const struct flow_action_cookie *fa_cookie;
1929+ enum devlink_trap_type trap_type;
1930+};
1931+
1932+/**
1933+ * struct devlink_trap_policer - Immutable packet trap policer attributes.
1934+ * @id: Policer identifier.
1935+ * @init_rate: Initial rate in packets / sec.
1936+ * @init_burst: Initial burst size in packets.
1937+ * @max_rate: Maximum rate.
1938+ * @min_rate: Minimum rate.
1939+ * @max_burst: Maximum burst size.
1940+ * @min_burst: Minimum burst size.
1941+ *
1942+ * Describes immutable attributes of packet trap policers that drivers register
1943+ * with devlink.
1944+ */
1945+struct devlink_trap_policer {
1946+ u32 id;
1947+ u64 init_rate;
1948+ u64 init_burst;
1949+ u64 max_rate;
1950+ u64 min_rate;
1951+ u64 max_burst;
1952+ u64 min_burst;
1953 };
1954
1955 /**
1956@@ -523,6 +751,7 @@ struct devlink_health_reporter_ops {
1957 * @name: Trap group name.
1958 * @id: Trap group identifier.
1959 * @generic: Whether the trap group is generic or not.
1960+ * @init_policer_id: Initial policer identifier.
1961 *
1962 * Describes immutable attributes of packet trap groups that drivers register
1963 * with devlink.
1964@@ -531,9 +760,11 @@ struct devlink_trap_group {
1965 const char *name;
1966 u16 id;
1967 bool generic;
1968+ u32 init_policer_id;
1969 };
1970
1971 #define DEVLINK_TRAP_METADATA_TYPE_F_IN_PORT BIT(0)
1972+#define DEVLINK_TRAP_METADATA_TYPE_F_FA_COOKIE BIT(1)
1973
1974 /**
1975 * struct devlink_trap - Immutable packet trap attributes.
1976@@ -542,7 +773,7 @@ struct devlink_trap_group {
1977 * @generic: Whether the trap is generic or not.
1978 * @id: Trap identifier.
1979 * @name: Trap name.
1980- * @group: Immutable packet trap group attributes.
1981+ * @init_group_id: Initial group identifier.
1982 * @metadata_cap: Metadata types that can be provided by the trap.
1983 *
1984 * Describes immutable attributes of packet traps that drivers register with
1985@@ -554,12 +785,12 @@ struct devlink_trap {
1986 bool generic;
1987 u16 id;
1988 const char *name;
1989- struct devlink_trap_group group;
1990+ u16 init_group_id;
1991 u32 metadata_cap;
1992 };
1993
1994 /* All traps must be documented in
1995- * Documentation/networking/devlink-trap.rst
1996+ * Documentation/networking/devlink/devlink-trap.rst
1997 */
1998 enum devlink_trap_generic_id {
1999 DEVLINK_TRAP_GENERIC_ID_SMAC_MC,
2000@@ -571,6 +802,89 @@ enum devlink_trap_generic_id {
2001 DEVLINK_TRAP_GENERIC_ID_BLACKHOLE_ROUTE,
2002 DEVLINK_TRAP_GENERIC_ID_TTL_ERROR,
2003 DEVLINK_TRAP_GENERIC_ID_TAIL_DROP,
2004+ DEVLINK_TRAP_GENERIC_ID_NON_IP_PACKET,
2005+ DEVLINK_TRAP_GENERIC_ID_UC_DIP_MC_DMAC,
2006+ DEVLINK_TRAP_GENERIC_ID_DIP_LB,
2007+ DEVLINK_TRAP_GENERIC_ID_SIP_MC,
2008+ DEVLINK_TRAP_GENERIC_ID_SIP_LB,
2009+ DEVLINK_TRAP_GENERIC_ID_CORRUPTED_IP_HDR,
2010+ DEVLINK_TRAP_GENERIC_ID_IPV4_SIP_BC,
2011+ DEVLINK_TRAP_GENERIC_ID_IPV6_MC_DIP_RESERVED_SCOPE,
2012+ DEVLINK_TRAP_GENERIC_ID_IPV6_MC_DIP_INTERFACE_LOCAL_SCOPE,
2013+ DEVLINK_TRAP_GENERIC_ID_MTU_ERROR,
2014+ DEVLINK_TRAP_GENERIC_ID_UNRESOLVED_NEIGH,
2015+ DEVLINK_TRAP_GENERIC_ID_RPF,
2016+ DEVLINK_TRAP_GENERIC_ID_REJECT_ROUTE,
2017+ DEVLINK_TRAP_GENERIC_ID_IPV4_LPM_UNICAST_MISS,
2018+ DEVLINK_TRAP_GENERIC_ID_IPV6_LPM_UNICAST_MISS,
2019+ DEVLINK_TRAP_GENERIC_ID_NON_ROUTABLE,
2020+ DEVLINK_TRAP_GENERIC_ID_DECAP_ERROR,
2021+ DEVLINK_TRAP_GENERIC_ID_OVERLAY_SMAC_MC,
2022+ DEVLINK_TRAP_GENERIC_ID_INGRESS_FLOW_ACTION_DROP,
2023+ DEVLINK_TRAP_GENERIC_ID_EGRESS_FLOW_ACTION_DROP,
2024+ DEVLINK_TRAP_GENERIC_ID_STP,
2025+ DEVLINK_TRAP_GENERIC_ID_LACP,
2026+ DEVLINK_TRAP_GENERIC_ID_LLDP,
2027+ DEVLINK_TRAP_GENERIC_ID_IGMP_QUERY,
2028+ DEVLINK_TRAP_GENERIC_ID_IGMP_V1_REPORT,
2029+ DEVLINK_TRAP_GENERIC_ID_IGMP_V2_REPORT,
2030+ DEVLINK_TRAP_GENERIC_ID_IGMP_V3_REPORT,
2031+ DEVLINK_TRAP_GENERIC_ID_IGMP_V2_LEAVE,
2032+ DEVLINK_TRAP_GENERIC_ID_MLD_QUERY,
2033+ DEVLINK_TRAP_GENERIC_ID_MLD_V1_REPORT,
2034+ DEVLINK_TRAP_GENERIC_ID_MLD_V2_REPORT,
2035+ DEVLINK_TRAP_GENERIC_ID_MLD_V1_DONE,
2036+ DEVLINK_TRAP_GENERIC_ID_IPV4_DHCP,
2037+ DEVLINK_TRAP_GENERIC_ID_IPV6_DHCP,
2038+ DEVLINK_TRAP_GENERIC_ID_ARP_REQUEST,
2039+ DEVLINK_TRAP_GENERIC_ID_ARP_RESPONSE,
2040+ DEVLINK_TRAP_GENERIC_ID_ARP_OVERLAY,
2041+ DEVLINK_TRAP_GENERIC_ID_IPV6_NEIGH_SOLICIT,
2042+ DEVLINK_TRAP_GENERIC_ID_IPV6_NEIGH_ADVERT,
2043+ DEVLINK_TRAP_GENERIC_ID_IPV4_BFD,
2044+ DEVLINK_TRAP_GENERIC_ID_IPV6_BFD,
2045+ DEVLINK_TRAP_GENERIC_ID_IPV4_OSPF,
2046+ DEVLINK_TRAP_GENERIC_ID_IPV6_OSPF,
2047+ DEVLINK_TRAP_GENERIC_ID_IPV4_BGP,
2048+ DEVLINK_TRAP_GENERIC_ID_IPV6_BGP,
2049+ DEVLINK_TRAP_GENERIC_ID_IPV4_VRRP,
2050+ DEVLINK_TRAP_GENERIC_ID_IPV6_VRRP,
2051+ DEVLINK_TRAP_GENERIC_ID_IPV4_PIM,
2052+ DEVLINK_TRAP_GENERIC_ID_IPV6_PIM,
2053+ DEVLINK_TRAP_GENERIC_ID_UC_LB,
2054+ DEVLINK_TRAP_GENERIC_ID_LOCAL_ROUTE,
2055+ DEVLINK_TRAP_GENERIC_ID_EXTERNAL_ROUTE,
2056+ DEVLINK_TRAP_GENERIC_ID_IPV6_UC_DIP_LINK_LOCAL_SCOPE,
2057+ DEVLINK_TRAP_GENERIC_ID_IPV6_DIP_ALL_NODES,
2058+ DEVLINK_TRAP_GENERIC_ID_IPV6_DIP_ALL_ROUTERS,
2059+ DEVLINK_TRAP_GENERIC_ID_IPV6_ROUTER_SOLICIT,
2060+ DEVLINK_TRAP_GENERIC_ID_IPV6_ROUTER_ADVERT,
2061+ DEVLINK_TRAP_GENERIC_ID_IPV6_REDIRECT,
2062+ DEVLINK_TRAP_GENERIC_ID_IPV4_ROUTER_ALERT,
2063+ DEVLINK_TRAP_GENERIC_ID_IPV6_ROUTER_ALERT,
2064+ DEVLINK_TRAP_GENERIC_ID_PTP_EVENT,
2065+ DEVLINK_TRAP_GENERIC_ID_PTP_GENERAL,
2066+ DEVLINK_TRAP_GENERIC_ID_FLOW_ACTION_SAMPLE,
2067+ DEVLINK_TRAP_GENERIC_ID_FLOW_ACTION_TRAP,
2068+ DEVLINK_TRAP_GENERIC_ID_EARLY_DROP,
2069+ DEVLINK_TRAP_GENERIC_ID_VXLAN_PARSING,
2070+ DEVLINK_TRAP_GENERIC_ID_LLC_SNAP_PARSING,
2071+ DEVLINK_TRAP_GENERIC_ID_VLAN_PARSING,
2072+ DEVLINK_TRAP_GENERIC_ID_PPPOE_PPP_PARSING,
2073+ DEVLINK_TRAP_GENERIC_ID_MPLS_PARSING,
2074+ DEVLINK_TRAP_GENERIC_ID_ARP_PARSING,
2075+ DEVLINK_TRAP_GENERIC_ID_IP_1_PARSING,
2076+ DEVLINK_TRAP_GENERIC_ID_IP_N_PARSING,
2077+ DEVLINK_TRAP_GENERIC_ID_GRE_PARSING,
2078+ DEVLINK_TRAP_GENERIC_ID_UDP_PARSING,
2079+ DEVLINK_TRAP_GENERIC_ID_TCP_PARSING,
2080+ DEVLINK_TRAP_GENERIC_ID_IPSEC_PARSING,
2081+ DEVLINK_TRAP_GENERIC_ID_SCTP_PARSING,
2082+ DEVLINK_TRAP_GENERIC_ID_DCCP_PARSING,
2083+ DEVLINK_TRAP_GENERIC_ID_GTP_PARSING,
2084+ DEVLINK_TRAP_GENERIC_ID_ESP_PARSING,
2085+ DEVLINK_TRAP_GENERIC_ID_BLACKHOLE_NEXTHOP,
2086+ DEVLINK_TRAP_GENERIC_ID_DMAC_FILTER,
2087
2088 /* Add new generic trap IDs above */
2089 __DEVLINK_TRAP_GENERIC_ID_MAX,
2090@@ -578,12 +892,35 @@ enum devlink_trap_generic_id {
2091 };
2092
2093 /* All trap groups must be documented in
2094- * Documentation/networking/devlink-trap.rst
2095+ * Documentation/networking/devlink/devlink-trap.rst
2096 */
2097 enum devlink_trap_group_generic_id {
2098 DEVLINK_TRAP_GROUP_GENERIC_ID_L2_DROPS,
2099 DEVLINK_TRAP_GROUP_GENERIC_ID_L3_DROPS,
2100+ DEVLINK_TRAP_GROUP_GENERIC_ID_L3_EXCEPTIONS,
2101 DEVLINK_TRAP_GROUP_GENERIC_ID_BUFFER_DROPS,
2102+ DEVLINK_TRAP_GROUP_GENERIC_ID_TUNNEL_DROPS,
2103+ DEVLINK_TRAP_GROUP_GENERIC_ID_ACL_DROPS,
2104+ DEVLINK_TRAP_GROUP_GENERIC_ID_STP,
2105+ DEVLINK_TRAP_GROUP_GENERIC_ID_LACP,
2106+ DEVLINK_TRAP_GROUP_GENERIC_ID_LLDP,
2107+ DEVLINK_TRAP_GROUP_GENERIC_ID_MC_SNOOPING,
2108+ DEVLINK_TRAP_GROUP_GENERIC_ID_DHCP,
2109+ DEVLINK_TRAP_GROUP_GENERIC_ID_NEIGH_DISCOVERY,
2110+ DEVLINK_TRAP_GROUP_GENERIC_ID_BFD,
2111+ DEVLINK_TRAP_GROUP_GENERIC_ID_OSPF,
2112+ DEVLINK_TRAP_GROUP_GENERIC_ID_BGP,
2113+ DEVLINK_TRAP_GROUP_GENERIC_ID_VRRP,
2114+ DEVLINK_TRAP_GROUP_GENERIC_ID_PIM,
2115+ DEVLINK_TRAP_GROUP_GENERIC_ID_UC_LB,
2116+ DEVLINK_TRAP_GROUP_GENERIC_ID_LOCAL_DELIVERY,
2117+ DEVLINK_TRAP_GROUP_GENERIC_ID_EXTERNAL_DELIVERY,
2118+ DEVLINK_TRAP_GROUP_GENERIC_ID_IPV6,
2119+ DEVLINK_TRAP_GROUP_GENERIC_ID_PTP_EVENT,
2120+ DEVLINK_TRAP_GROUP_GENERIC_ID_PTP_GENERAL,
2121+ DEVLINK_TRAP_GROUP_GENERIC_ID_ACL_SAMPLE,
2122+ DEVLINK_TRAP_GROUP_GENERIC_ID_ACL_TRAP,
2123+ DEVLINK_TRAP_GROUP_GENERIC_ID_PARSER_ERROR_DROPS,
2124
2125 /* Add new generic trap group IDs above */
2126 __DEVLINK_TRAP_GROUP_GENERIC_ID_MAX,
2127@@ -609,26 +946,239 @@ enum devlink_trap_group_generic_id {
2128 "ttl_value_is_too_small"
2129 #define DEVLINK_TRAP_GENERIC_NAME_TAIL_DROP \
2130 "tail_drop"
2131+#define DEVLINK_TRAP_GENERIC_NAME_NON_IP_PACKET \
2132+ "non_ip"
2133+#define DEVLINK_TRAP_GENERIC_NAME_UC_DIP_MC_DMAC \
2134+ "uc_dip_over_mc_dmac"
2135+#define DEVLINK_TRAP_GENERIC_NAME_DIP_LB \
2136+ "dip_is_loopback_address"
2137+#define DEVLINK_TRAP_GENERIC_NAME_SIP_MC \
2138+ "sip_is_mc"
2139+#define DEVLINK_TRAP_GENERIC_NAME_SIP_LB \
2140+ "sip_is_loopback_address"
2141+#define DEVLINK_TRAP_GENERIC_NAME_CORRUPTED_IP_HDR \
2142+ "ip_header_corrupted"
2143+#define DEVLINK_TRAP_GENERIC_NAME_IPV4_SIP_BC \
2144+ "ipv4_sip_is_limited_bc"
2145+#define DEVLINK_TRAP_GENERIC_NAME_IPV6_MC_DIP_RESERVED_SCOPE \
2146+ "ipv6_mc_dip_reserved_scope"
2147+#define DEVLINK_TRAP_GENERIC_NAME_IPV6_MC_DIP_INTERFACE_LOCAL_SCOPE \
2148+ "ipv6_mc_dip_interface_local_scope"
2149+#define DEVLINK_TRAP_GENERIC_NAME_MTU_ERROR \
2150+ "mtu_value_is_too_small"
2151+#define DEVLINK_TRAP_GENERIC_NAME_UNRESOLVED_NEIGH \
2152+ "unresolved_neigh"
2153+#define DEVLINK_TRAP_GENERIC_NAME_RPF \
2154+ "mc_reverse_path_forwarding"
2155+#define DEVLINK_TRAP_GENERIC_NAME_REJECT_ROUTE \
2156+ "reject_route"
2157+#define DEVLINK_TRAP_GENERIC_NAME_IPV4_LPM_UNICAST_MISS \
2158+ "ipv4_lpm_miss"
2159+#define DEVLINK_TRAP_GENERIC_NAME_IPV6_LPM_UNICAST_MISS \
2160+ "ipv6_lpm_miss"
2161+#define DEVLINK_TRAP_GENERIC_NAME_NON_ROUTABLE \
2162+ "non_routable_packet"
2163+#define DEVLINK_TRAP_GENERIC_NAME_DECAP_ERROR \
2164+ "decap_error"
2165+#define DEVLINK_TRAP_GENERIC_NAME_OVERLAY_SMAC_MC \
2166+ "overlay_smac_is_mc"
2167+#define DEVLINK_TRAP_GENERIC_NAME_INGRESS_FLOW_ACTION_DROP \
2168+ "ingress_flow_action_drop"
2169+#define DEVLINK_TRAP_GENERIC_NAME_EGRESS_FLOW_ACTION_DROP \
2170+ "egress_flow_action_drop"
2171+#define DEVLINK_TRAP_GENERIC_NAME_STP \
2172+ "stp"
2173+#define DEVLINK_TRAP_GENERIC_NAME_LACP \
2174+ "lacp"
2175+#define DEVLINK_TRAP_GENERIC_NAME_LLDP \
2176+ "lldp"
2177+#define DEVLINK_TRAP_GENERIC_NAME_IGMP_QUERY \
2178+ "igmp_query"
2179+#define DEVLINK_TRAP_GENERIC_NAME_IGMP_V1_REPORT \
2180+ "igmp_v1_report"
2181+#define DEVLINK_TRAP_GENERIC_NAME_IGMP_V2_REPORT \
2182+ "igmp_v2_report"
2183+#define DEVLINK_TRAP_GENERIC_NAME_IGMP_V3_REPORT \
2184+ "igmp_v3_report"
2185+#define DEVLINK_TRAP_GENERIC_NAME_IGMP_V2_LEAVE \
2186+ "igmp_v2_leave"
2187+#define DEVLINK_TRAP_GENERIC_NAME_MLD_QUERY \
2188+ "mld_query"
2189+#define DEVLINK_TRAP_GENERIC_NAME_MLD_V1_REPORT \
2190+ "mld_v1_report"
2191+#define DEVLINK_TRAP_GENERIC_NAME_MLD_V2_REPORT \
2192+ "mld_v2_report"
2193+#define DEVLINK_TRAP_GENERIC_NAME_MLD_V1_DONE \
2194+ "mld_v1_done"
2195+#define DEVLINK_TRAP_GENERIC_NAME_IPV4_DHCP \
2196+ "ipv4_dhcp"
2197+#define DEVLINK_TRAP_GENERIC_NAME_IPV6_DHCP \
2198+ "ipv6_dhcp"
2199+#define DEVLINK_TRAP_GENERIC_NAME_ARP_REQUEST \
2200+ "arp_request"
2201+#define DEVLINK_TRAP_GENERIC_NAME_ARP_RESPONSE \
2202+ "arp_response"
2203+#define DEVLINK_TRAP_GENERIC_NAME_ARP_OVERLAY \
2204+ "arp_overlay"
2205+#define DEVLINK_TRAP_GENERIC_NAME_IPV6_NEIGH_SOLICIT \
2206+ "ipv6_neigh_solicit"
2207+#define DEVLINK_TRAP_GENERIC_NAME_IPV6_NEIGH_ADVERT \
2208+ "ipv6_neigh_advert"
2209+#define DEVLINK_TRAP_GENERIC_NAME_IPV4_BFD \
2210+ "ipv4_bfd"
2211+#define DEVLINK_TRAP_GENERIC_NAME_IPV6_BFD \
2212+ "ipv6_bfd"
2213+#define DEVLINK_TRAP_GENERIC_NAME_IPV4_OSPF \
2214+ "ipv4_ospf"
2215+#define DEVLINK_TRAP_GENERIC_NAME_IPV6_OSPF \
2216+ "ipv6_ospf"
2217+#define DEVLINK_TRAP_GENERIC_NAME_IPV4_BGP \
2218+ "ipv4_bgp"
2219+#define DEVLINK_TRAP_GENERIC_NAME_IPV6_BGP \
2220+ "ipv6_bgp"
2221+#define DEVLINK_TRAP_GENERIC_NAME_IPV4_VRRP \
2222+ "ipv4_vrrp"
2223+#define DEVLINK_TRAP_GENERIC_NAME_IPV6_VRRP \
2224+ "ipv6_vrrp"
2225+#define DEVLINK_TRAP_GENERIC_NAME_IPV4_PIM \
2226+ "ipv4_pim"
2227+#define DEVLINK_TRAP_GENERIC_NAME_IPV6_PIM \
2228+ "ipv6_pim"
2229+#define DEVLINK_TRAP_GENERIC_NAME_UC_LB \
2230+ "uc_loopback"
2231+#define DEVLINK_TRAP_GENERIC_NAME_LOCAL_ROUTE \
2232+ "local_route"
2233+#define DEVLINK_TRAP_GENERIC_NAME_EXTERNAL_ROUTE \
2234+ "external_route"
2235+#define DEVLINK_TRAP_GENERIC_NAME_IPV6_UC_DIP_LINK_LOCAL_SCOPE \
2236+ "ipv6_uc_dip_link_local_scope"
2237+#define DEVLINK_TRAP_GENERIC_NAME_IPV6_DIP_ALL_NODES \
2238+ "ipv6_dip_all_nodes"
2239+#define DEVLINK_TRAP_GENERIC_NAME_IPV6_DIP_ALL_ROUTERS \
2240+ "ipv6_dip_all_routers"
2241+#define DEVLINK_TRAP_GENERIC_NAME_IPV6_ROUTER_SOLICIT \
2242+ "ipv6_router_solicit"
2243+#define DEVLINK_TRAP_GENERIC_NAME_IPV6_ROUTER_ADVERT \
2244+ "ipv6_router_advert"
2245+#define DEVLINK_TRAP_GENERIC_NAME_IPV6_REDIRECT \
2246+ "ipv6_redirect"
2247+#define DEVLINK_TRAP_GENERIC_NAME_IPV4_ROUTER_ALERT \
2248+ "ipv4_router_alert"
2249+#define DEVLINK_TRAP_GENERIC_NAME_IPV6_ROUTER_ALERT \
2250+ "ipv6_router_alert"
2251+#define DEVLINK_TRAP_GENERIC_NAME_PTP_EVENT \
2252+ "ptp_event"
2253+#define DEVLINK_TRAP_GENERIC_NAME_PTP_GENERAL \
2254+ "ptp_general"
2255+#define DEVLINK_TRAP_GENERIC_NAME_FLOW_ACTION_SAMPLE \
2256+ "flow_action_sample"
2257+#define DEVLINK_TRAP_GENERIC_NAME_FLOW_ACTION_TRAP \
2258+ "flow_action_trap"
2259+#define DEVLINK_TRAP_GENERIC_NAME_EARLY_DROP \
2260+ "early_drop"
2261+#define DEVLINK_TRAP_GENERIC_NAME_VXLAN_PARSING \
2262+ "vxlan_parsing"
2263+#define DEVLINK_TRAP_GENERIC_NAME_LLC_SNAP_PARSING \
2264+ "llc_snap_parsing"
2265+#define DEVLINK_TRAP_GENERIC_NAME_VLAN_PARSING \
2266+ "vlan_parsing"
2267+#define DEVLINK_TRAP_GENERIC_NAME_PPPOE_PPP_PARSING \
2268+ "pppoe_ppp_parsing"
2269+#define DEVLINK_TRAP_GENERIC_NAME_MPLS_PARSING \
2270+ "mpls_parsing"
2271+#define DEVLINK_TRAP_GENERIC_NAME_ARP_PARSING \
2272+ "arp_parsing"
2273+#define DEVLINK_TRAP_GENERIC_NAME_IP_1_PARSING \
2274+ "ip_1_parsing"
2275+#define DEVLINK_TRAP_GENERIC_NAME_IP_N_PARSING \
2276+ "ip_n_parsing"
2277+#define DEVLINK_TRAP_GENERIC_NAME_GRE_PARSING \
2278+ "gre_parsing"
2279+#define DEVLINK_TRAP_GENERIC_NAME_UDP_PARSING \
2280+ "udp_parsing"
2281+#define DEVLINK_TRAP_GENERIC_NAME_TCP_PARSING \
2282+ "tcp_parsing"
2283+#define DEVLINK_TRAP_GENERIC_NAME_IPSEC_PARSING \
2284+ "ipsec_parsing"
2285+#define DEVLINK_TRAP_GENERIC_NAME_SCTP_PARSING \
2286+ "sctp_parsing"
2287+#define DEVLINK_TRAP_GENERIC_NAME_DCCP_PARSING \
2288+ "dccp_parsing"
2289+#define DEVLINK_TRAP_GENERIC_NAME_GTP_PARSING \
2290+ "gtp_parsing"
2291+#define DEVLINK_TRAP_GENERIC_NAME_ESP_PARSING \
2292+ "esp_parsing"
2293+#define DEVLINK_TRAP_GENERIC_NAME_BLACKHOLE_NEXTHOP \
2294+ "blackhole_nexthop"
2295+#define DEVLINK_TRAP_GENERIC_NAME_DMAC_FILTER \
2296+ "dmac_filter"
2297
2298 #define DEVLINK_TRAP_GROUP_GENERIC_NAME_L2_DROPS \
2299 "l2_drops"
2300 #define DEVLINK_TRAP_GROUP_GENERIC_NAME_L3_DROPS \
2301 "l3_drops"
2302+#define DEVLINK_TRAP_GROUP_GENERIC_NAME_L3_EXCEPTIONS \
2303+ "l3_exceptions"
2304 #define DEVLINK_TRAP_GROUP_GENERIC_NAME_BUFFER_DROPS \
2305 "buffer_drops"
2306-
2307-#define DEVLINK_TRAP_GENERIC(_type, _init_action, _id, _group, _metadata_cap) \
2308+#define DEVLINK_TRAP_GROUP_GENERIC_NAME_TUNNEL_DROPS \
2309+ "tunnel_drops"
2310+#define DEVLINK_TRAP_GROUP_GENERIC_NAME_ACL_DROPS \
2311+ "acl_drops"
2312+#define DEVLINK_TRAP_GROUP_GENERIC_NAME_STP \
2313+ "stp"
2314+#define DEVLINK_TRAP_GROUP_GENERIC_NAME_LACP \
2315+ "lacp"
2316+#define DEVLINK_TRAP_GROUP_GENERIC_NAME_LLDP \
2317+ "lldp"
2318+#define DEVLINK_TRAP_GROUP_GENERIC_NAME_MC_SNOOPING \
2319+ "mc_snooping"
2320+#define DEVLINK_TRAP_GROUP_GENERIC_NAME_DHCP \
2321+ "dhcp"
2322+#define DEVLINK_TRAP_GROUP_GENERIC_NAME_NEIGH_DISCOVERY \
2323+ "neigh_discovery"
2324+#define DEVLINK_TRAP_GROUP_GENERIC_NAME_BFD \
2325+ "bfd"
2326+#define DEVLINK_TRAP_GROUP_GENERIC_NAME_OSPF \
2327+ "ospf"
2328+#define DEVLINK_TRAP_GROUP_GENERIC_NAME_BGP \
2329+ "bgp"
2330+#define DEVLINK_TRAP_GROUP_GENERIC_NAME_VRRP \
2331+ "vrrp"
2332+#define DEVLINK_TRAP_GROUP_GENERIC_NAME_PIM \
2333+ "pim"
2334+#define DEVLINK_TRAP_GROUP_GENERIC_NAME_UC_LB \
2335+ "uc_loopback"
2336+#define DEVLINK_TRAP_GROUP_GENERIC_NAME_LOCAL_DELIVERY \
2337+ "local_delivery"
2338+#define DEVLINK_TRAP_GROUP_GENERIC_NAME_EXTERNAL_DELIVERY \
2339+ "external_delivery"
2340+#define DEVLINK_TRAP_GROUP_GENERIC_NAME_IPV6 \
2341+ "ipv6"
2342+#define DEVLINK_TRAP_GROUP_GENERIC_NAME_PTP_EVENT \
2343+ "ptp_event"
2344+#define DEVLINK_TRAP_GROUP_GENERIC_NAME_PTP_GENERAL \
2345+ "ptp_general"
2346+#define DEVLINK_TRAP_GROUP_GENERIC_NAME_ACL_SAMPLE \
2347+ "acl_sample"
2348+#define DEVLINK_TRAP_GROUP_GENERIC_NAME_ACL_TRAP \
2349+ "acl_trap"
2350+#define DEVLINK_TRAP_GROUP_GENERIC_NAME_PARSER_ERROR_DROPS \
2351+ "parser_error_drops"
2352+
2353+#define DEVLINK_TRAP_GENERIC(_type, _init_action, _id, _group_id, \
2354+ _metadata_cap) \
2355 { \
2356 .type = DEVLINK_TRAP_TYPE_##_type, \
2357 .init_action = DEVLINK_TRAP_ACTION_##_init_action, \
2358 .generic = true, \
2359 .id = DEVLINK_TRAP_GENERIC_ID_##_id, \
2360 .name = DEVLINK_TRAP_GENERIC_NAME_##_id, \
2361- .group = _group, \
2362+ .init_group_id = _group_id, \
2363 .metadata_cap = _metadata_cap, \
2364 }
2365
2366-#define DEVLINK_TRAP_DRIVER(_type, _init_action, _id, _name, _group, \
2367+#define DEVLINK_TRAP_DRIVER(_type, _init_action, _id, _name, _group_id, \
2368 _metadata_cap) \
2369 { \
2370 .type = DEVLINK_TRAP_TYPE_##_type, \
2371@@ -636,21 +1186,45 @@ enum devlink_trap_group_generic_id {
2372 .generic = false, \
2373 .id = _id, \
2374 .name = _name, \
2375- .group = _group, \
2376+ .init_group_id = _group_id, \
2377 .metadata_cap = _metadata_cap, \
2378 }
2379
2380-#define DEVLINK_TRAP_GROUP_GENERIC(_id) \
2381+#define DEVLINK_TRAP_GROUP_GENERIC(_id, _policer_id) \
2382 { \
2383 .name = DEVLINK_TRAP_GROUP_GENERIC_NAME_##_id, \
2384 .id = DEVLINK_TRAP_GROUP_GENERIC_ID_##_id, \
2385 .generic = true, \
2386+ .init_policer_id = _policer_id, \
2387+ }
2388+
2389+#define DEVLINK_TRAP_POLICER(_id, _rate, _burst, _max_rate, _min_rate, \
2390+ _max_burst, _min_burst) \
2391+ { \
2392+ .id = _id, \
2393+ .init_rate = _rate, \
2394+ .init_burst = _burst, \
2395+ .max_rate = _max_rate, \
2396+ .min_rate = _min_rate, \
2397+ .max_burst = _max_burst, \
2398+ .min_burst = _min_burst, \
2399 }
2400
2401 struct devlink_ops {
2402- int (*reload_down)(struct devlink *devlink,
2403+ /**
2404+ * @supported_flash_update_params:
2405+ * mask of parameters supported by the driver's .flash_update
2406+ * implemementation.
2407+ */
2408+ u32 supported_flash_update_params;
2409+ unsigned long reload_actions;
2410+ unsigned long reload_limits;
2411+ int (*reload_down)(struct devlink *devlink, bool netns_change,
2412+ enum devlink_reload_action action,
2413+ enum devlink_reload_limit limit,
2414 struct netlink_ext_ack *extack);
2415- int (*reload_up)(struct devlink *devlink,
2416+ int (*reload_up)(struct devlink *devlink, enum devlink_reload_action action,
2417+ enum devlink_reload_limit limit, u32 *actions_performed,
2418 struct netlink_ext_ack *extack);
2419 int (*port_type_set)(struct devlink_port *devlink_port,
2420 enum devlink_port_type port_type);
2421@@ -708,8 +1282,15 @@ struct devlink_ops {
2422 struct netlink_ext_ack *extack);
2423 int (*info_get)(struct devlink *devlink, struct devlink_info_req *req,
2424 struct netlink_ext_ack *extack);
2425- int (*flash_update)(struct devlink *devlink, const char *file_name,
2426- const char *component,
2427+ /**
2428+ * @flash_update: Device flash update function
2429+ *
2430+ * Used to perform a flash update for the device. The set of
2431+ * parameters supported by the driver should be set in
2432+ * supported_flash_update_params.
2433+ */
2434+ int (*flash_update)(struct devlink *devlink,
2435+ struct devlink_flash_update_params *params,
2436 struct netlink_ext_ack *extack);
2437 /**
2438 * @trap_init: Trap initialization function.
2439@@ -734,7 +1315,8 @@ struct devlink_ops {
2440 */
2441 int (*trap_action_set)(struct devlink *devlink,
2442 const struct devlink_trap *trap,
2443- enum devlink_trap_action action);
2444+ enum devlink_trap_action action,
2445+ struct netlink_ext_ack *extack);
2446 /**
2447 * @trap_group_init: Trap group initialization function.
2448 *
2449@@ -743,6 +1325,187 @@ struct devlink_ops {
2450 */
2451 int (*trap_group_init)(struct devlink *devlink,
2452 const struct devlink_trap_group *group);
2453+ /**
2454+ * @trap_group_set: Trap group parameters set function.
2455+ *
2456+ * Note: @policer can be NULL when a policer is being unbound from
2457+ * @group.
2458+ */
2459+ int (*trap_group_set)(struct devlink *devlink,
2460+ const struct devlink_trap_group *group,
2461+ const struct devlink_trap_policer *policer,
2462+ struct netlink_ext_ack *extack);
2463+ /**
2464+ * @trap_group_action_set: Trap group action set function.
2465+ *
2466+ * If this callback is populated, it will take precedence over looping
2467+ * over all traps in a group and calling .trap_action_set().
2468+ */
2469+ int (*trap_group_action_set)(struct devlink *devlink,
2470+ const struct devlink_trap_group *group,
2471+ enum devlink_trap_action action,
2472+ struct netlink_ext_ack *extack);
2473+ /**
2474+ * @trap_drop_counter_get: Trap drop counter get function.
2475+ *
2476+ * Should be used by device drivers to report number of packets
2477+ * that have been dropped, and cannot be passed to the devlink
2478+ * subsystem by the underlying device.
2479+ */
2480+ int (*trap_drop_counter_get)(struct devlink *devlink,
2481+ const struct devlink_trap *trap,
2482+ u64 *p_drops);
2483+ /**
2484+ * @trap_policer_init: Trap policer initialization function.
2485+ *
2486+ * Should be used by device drivers to initialize the trap policer in
2487+ * the underlying device.
2488+ */
2489+ int (*trap_policer_init)(struct devlink *devlink,
2490+ const struct devlink_trap_policer *policer);
2491+ /**
2492+ * @trap_policer_fini: Trap policer de-initialization function.
2493+ *
2494+ * Should be used by device drivers to de-initialize the trap policer
2495+ * in the underlying device.
2496+ */
2497+ void (*trap_policer_fini)(struct devlink *devlink,
2498+ const struct devlink_trap_policer *policer);
2499+ /**
2500+ * @trap_policer_set: Trap policer parameters set function.
2501+ */
2502+ int (*trap_policer_set)(struct devlink *devlink,
2503+ const struct devlink_trap_policer *policer,
2504+ u64 rate, u64 burst,
2505+ struct netlink_ext_ack *extack);
2506+ /**
2507+ * @trap_policer_counter_get: Trap policer counter get function.
2508+ *
2509+ * Should be used by device drivers to report number of packets dropped
2510+ * by the policer.
2511+ */
2512+ int (*trap_policer_counter_get)(struct devlink *devlink,
2513+ const struct devlink_trap_policer *policer,
2514+ u64 *p_drops);
2515+ /**
2516+ * @port_function_hw_addr_get: Port function's hardware address get function.
2517+ *
2518+ * Should be used by device drivers to report the hardware address of a function managed
2519+ * by the devlink port. Driver should return -EOPNOTSUPP if it doesn't support port
2520+ * function handling for a particular port.
2521+ *
2522+ * Note: @extack can be NULL when port notifier queries the port function.
2523+ */
2524+ int (*port_function_hw_addr_get)(struct devlink *devlink, struct devlink_port *port,
2525+ u8 *hw_addr, int *hw_addr_len,
2526+ struct netlink_ext_ack *extack);
2527+ /**
2528+ * @port_function_hw_addr_set: Port function's hardware address set function.
2529+ *
2530+ * Should be used by device drivers to set the hardware address of a function managed
2531+ * by the devlink port. Driver should return -EOPNOTSUPP if it doesn't support port
2532+ * function handling for a particular port.
2533+ */
2534+ int (*port_function_hw_addr_set)(struct devlink *devlink, struct devlink_port *port,
2535+ const u8 *hw_addr, int hw_addr_len,
2536+ struct netlink_ext_ack *extack);
2537+ /**
2538+ * port_new() - Add a new port function of a specified flavor
2539+ * @devlink: Devlink instance
2540+ * @attrs: attributes of the new port
2541+ * @extack: extack for reporting error messages
2542+ * @new_port_index: index of the new port
2543+ *
2544+ * Devlink core will call this device driver function upon user request
2545+ * to create a new port function of a specified flavor and optional
2546+ * attributes
2547+ *
2548+ * Notes:
2549+ * - Called without devlink instance lock being held. Drivers must
2550+ * implement own means of synchronization
2551+ * - On success, drivers must register a port with devlink core
2552+ *
2553+ * Return: 0 on success, negative value otherwise.
2554+ */
2555+ int (*port_new)(struct devlink *devlink,
2556+ const struct devlink_port_new_attrs *attrs,
2557+ struct netlink_ext_ack *extack,
2558+ unsigned int *new_port_index);
2559+ /**
2560+ * port_del() - Delete a port function
2561+ * @devlink: Devlink instance
2562+ * @port_index: port function index to delete
2563+ * @extack: extack for reporting error messages
2564+ *
2565+ * Devlink core will call this device driver function upon user request
2566+ * to delete a previously created port function
2567+ *
2568+ * Notes:
2569+ * - Called without devlink instance lock being held. Drivers must
2570+ * implement own means of synchronization
2571+ * - On success, drivers must unregister the corresponding devlink
2572+ * port
2573+ *
2574+ * Return: 0 on success, negative value otherwise.
2575+ */
2576+ int (*port_del)(struct devlink *devlink, unsigned int port_index,
2577+ struct netlink_ext_ack *extack);
2578+ /**
2579+ * port_fn_state_get() - Get the state of a port function
2580+ * @devlink: Devlink instance
2581+ * @port: The devlink port
2582+ * @state: Admin configured state
2583+ * @opstate: Current operational state
2584+ * @extack: extack for reporting error messages
2585+ *
2586+ * Reports the admin and operational state of a devlink port function
2587+ *
2588+ * Return: 0 on success, negative value otherwise.
2589+ */
2590+ int (*port_fn_state_get)(struct devlink *devlink,
2591+ struct devlink_port *port,
2592+ enum devlink_port_fn_state *state,
2593+ enum devlink_port_fn_opstate *opstate,
2594+ struct netlink_ext_ack *extack);
2595+ /**
2596+ * port_fn_state_set() - Set the admin state of a port function
2597+ * @devlink: Devlink instance
2598+ * @port: The devlink port
2599+ * @state: Admin state
2600+ * @extack: extack for reporting error messages
2601+ *
2602+ * Set the admin state of a devlink port function
2603+ *
2604+ * Return: 0 on success, negative value otherwise.
2605+ */
2606+ int (*port_fn_state_set)(struct devlink *devlink,
2607+ struct devlink_port *port,
2608+ enum devlink_port_fn_state state,
2609+ struct netlink_ext_ack *extack);
2610+
2611+ /**
2612+ * Rate control callbacks.
2613+ */
2614+ int (*rate_leaf_tx_share_set)(struct devlink_rate *devlink_rate, void *priv,
2615+ u64 tx_share, struct netlink_ext_ack *extack);
2616+ int (*rate_leaf_tx_max_set)(struct devlink_rate *devlink_rate, void *priv,
2617+ u64 tx_max, struct netlink_ext_ack *extack);
2618+ int (*rate_node_tx_share_set)(struct devlink_rate *devlink_rate, void *priv,
2619+ u64 tx_share, struct netlink_ext_ack *extack);
2620+ int (*rate_node_tx_max_set)(struct devlink_rate *devlink_rate, void *priv,
2621+ u64 tx_max, struct netlink_ext_ack *extack);
2622+ int (*rate_node_new)(struct devlink_rate *rate_node, void **priv,
2623+ struct netlink_ext_ack *extack);
2624+ int (*rate_node_del)(struct devlink_rate *rate_node, void *priv,
2625+ struct netlink_ext_ack *extack);
2626+ int (*rate_leaf_parent_set)(struct devlink_rate *child,
2627+ struct devlink_rate *parent,
2628+ void *priv_child, void *priv_parent,
2629+ struct netlink_ext_ack *extack);
2630+ int (*rate_node_parent_set)(struct devlink_rate *child,
2631+ struct devlink_rate *parent,
2632+ void *priv_child, void *priv_parent,
2633+ struct netlink_ext_ack *extack);
2634 };
2635
2636 static inline void *devlink_priv(struct devlink *devlink)
2637@@ -776,7 +1539,19 @@ static inline struct devlink *netdev_to_devlink(struct net_device *dev)
2638
2639 struct ib_device;
2640
2641-struct devlink *devlink_alloc(const struct devlink_ops *ops, size_t priv_size);
2642+struct net *devlink_net(const struct devlink *devlink);
2643+/* This call is intended for software devices that can create
2644+ * devlink instances in other namespaces than init_net.
2645+ *
2646+ * Drivers that operate on real HW must use devlink_alloc() instead.
2647+ */
2648+struct devlink *devlink_alloc_ns(const struct devlink_ops *ops,
2649+ size_t priv_size, struct net *net);
2650+static inline struct devlink *devlink_alloc(const struct devlink_ops *ops,
2651+ size_t priv_size)
2652+{
2653+ return devlink_alloc_ns(ops, priv_size, &init_net);
2654+}
2655 int devlink_register(struct devlink *devlink, struct device *dev);
2656 void devlink_unregister(struct devlink *devlink);
2657 void devlink_reload_enable(struct devlink *devlink);
2658@@ -792,18 +1567,17 @@ void devlink_port_type_ib_set(struct devlink_port *devlink_port,
2659 struct ib_device *ibdev);
2660 void devlink_port_type_clear(struct devlink_port *devlink_port);
2661 void devlink_port_attrs_set(struct devlink_port *devlink_port,
2662- enum devlink_port_flavour flavour,
2663- u32 port_number, bool split,
2664- u32 split_subport_number,
2665- const unsigned char *switch_id,
2666- unsigned char switch_id_len);
2667-void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port,
2668- const unsigned char *switch_id,
2669- unsigned char switch_id_len, u16 pf);
2670-void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port,
2671- const unsigned char *switch_id,
2672- unsigned char switch_id_len,
2673- u16 pf, u16 vf);
2674+ struct devlink_port_attrs *devlink_port_attrs);
2675+void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u32 controller,
2676+ u16 pf, bool external);
2677+void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 controller,
2678+ u16 pf, u16 vf, bool external);
2679+void devlink_port_attrs_pci_sf_set(struct devlink_port *devlink_port,
2680+ u32 controller, u16 pf, u32 sf,
2681+ bool external);
2682+int devlink_rate_leaf_create(struct devlink_port *port, void *priv);
2683+void devlink_rate_leaf_destroy(struct devlink_port *devlink_port);
2684+void devlink_rate_nodes_destroy(struct devlink *devlink);
2685 int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
2686 u32 size, u16 ingress_pools_count,
2687 u16 egress_pools_count, u16 ingress_tc_count,
2688@@ -883,19 +1657,27 @@ void devlink_port_param_value_changed(struct devlink_port *devlink_port,
2689 u32 param_id);
2690 void devlink_param_value_str_fill(union devlink_param_value *dst_val,
2691 const char *src);
2692-struct devlink_region *devlink_region_create(struct devlink *devlink,
2693- const char *region_name,
2694- u32 region_max_snapshots,
2695- u64 region_size);
2696+struct devlink_region *
2697+devlink_region_create(struct devlink *devlink,
2698+ const struct devlink_region_ops *ops,
2699+ u32 region_max_snapshots, u64 region_size);
2700+struct devlink_region *
2701+devlink_port_region_create(struct devlink_port *port,
2702+ const struct devlink_port_region_ops *ops,
2703+ u32 region_max_snapshots, u64 region_size);
2704 void devlink_region_destroy(struct devlink_region *region);
2705-u32 devlink_region_shapshot_id_get(struct devlink *devlink);
2706+void devlink_port_region_destroy(struct devlink_region *region);
2707+
2708+int devlink_region_snapshot_id_get(struct devlink *devlink, u32 *id);
2709+void devlink_region_snapshot_id_put(struct devlink *devlink, u32 id);
2710 int devlink_region_snapshot_create(struct devlink_region *region,
2711- u8 *data, u32 snapshot_id,
2712- devlink_snapshot_data_dest_t *data_destructor);
2713+ u8 *data, u32 snapshot_id);
2714 int devlink_info_serial_number_put(struct devlink_info_req *req,
2715 const char *sn);
2716 int devlink_info_driver_name_put(struct devlink_info_req *req,
2717 const char *name);
2718+int devlink_info_board_serial_number_put(struct devlink_info_req *req,
2719+ const char *bsn);
2720 int devlink_info_version_fixed_put(struct devlink_info_req *req,
2721 const char *version_name,
2722 const char *version_value);
2723@@ -915,6 +1697,9 @@ int devlink_fmsg_pair_nest_end(struct devlink_fmsg *fmsg);
2724 int devlink_fmsg_arr_pair_nest_start(struct devlink_fmsg *fmsg,
2725 const char *name);
2726 int devlink_fmsg_arr_pair_nest_end(struct devlink_fmsg *fmsg);
2727+int devlink_fmsg_binary_pair_nest_start(struct devlink_fmsg *fmsg,
2728+ const char *name);
2729+int devlink_fmsg_binary_pair_nest_end(struct devlink_fmsg *fmsg);
2730
2731 int devlink_fmsg_bool_put(struct devlink_fmsg *fmsg, bool value);
2732 int devlink_fmsg_u8_put(struct devlink_fmsg *fmsg, u8 value);
2733@@ -935,16 +1720,24 @@ int devlink_fmsg_u64_pair_put(struct devlink_fmsg *fmsg, const char *name,
2734 int devlink_fmsg_string_pair_put(struct devlink_fmsg *fmsg, const char *name,
2735 const char *value);
2736 int devlink_fmsg_binary_pair_put(struct devlink_fmsg *fmsg, const char *name,
2737- const void *value, u16 value_len);
2738+ const void *value, u32 value_len);
2739
2740 struct devlink_health_reporter *
2741 devlink_health_reporter_create(struct devlink *devlink,
2742 const struct devlink_health_reporter_ops *ops,
2743- u64 graceful_period, bool auto_recover,
2744- void *priv);
2745+ u64 graceful_period, void *priv);
2746+
2747+struct devlink_health_reporter *
2748+devlink_port_health_reporter_create(struct devlink_port *port,
2749+ const struct devlink_health_reporter_ops *ops,
2750+ u64 graceful_period, void *priv);
2751+
2752 void
2753 devlink_health_reporter_destroy(struct devlink_health_reporter *reporter);
2754
2755+void
2756+devlink_port_health_reporter_destroy(struct devlink_health_reporter *reporter);
2757+
2758 void *
2759 devlink_health_reporter_priv(struct devlink_health_reporter *reporter);
2760 int devlink_health_report(struct devlink_health_reporter *reporter,
2761@@ -952,16 +1745,23 @@ int devlink_health_report(struct devlink_health_reporter *reporter,
2762 void
2763 devlink_health_reporter_state_update(struct devlink_health_reporter *reporter,
2764 enum devlink_health_reporter_state state);
2765+void
2766+devlink_health_reporter_recovery_done(struct devlink_health_reporter *reporter);
2767
2768 bool devlink_is_reload_failed(const struct devlink *devlink);
2769+void devlink_remote_reload_actions_performed(struct devlink *devlink,
2770+ enum devlink_reload_limit limit,
2771+ u32 actions_performed);
2772
2773-void devlink_flash_update_begin_notify(struct devlink *devlink);
2774-void devlink_flash_update_end_notify(struct devlink *devlink);
2775 void devlink_flash_update_status_notify(struct devlink *devlink,
2776 const char *status_msg,
2777 const char *component,
2778 unsigned long done,
2779 unsigned long total);
2780+void devlink_flash_update_timeout_notify(struct devlink *devlink,
2781+ const char *status_msg,
2782+ const char *component,
2783+ unsigned long timeout);
2784
2785 int devlink_traps_register(struct devlink *devlink,
2786 const struct devlink_trap *traps,
2787@@ -969,10 +1769,24 @@ int devlink_traps_register(struct devlink *devlink,
2788 void devlink_traps_unregister(struct devlink *devlink,
2789 const struct devlink_trap *traps,
2790 size_t traps_count);
2791-void devlink_trap_report(struct devlink *devlink,
2792- struct sk_buff *skb, void *trap_ctx,
2793- struct devlink_port *in_devlink_port);
2794+void devlink_trap_report(struct devlink *devlink, struct sk_buff *skb,
2795+ void *trap_ctx, struct devlink_port *in_devlink_port,
2796+ const struct flow_action_cookie *fa_cookie);
2797 void *devlink_trap_ctx_priv(void *trap_ctx);
2798+int devlink_trap_groups_register(struct devlink *devlink,
2799+ const struct devlink_trap_group *groups,
2800+ size_t groups_count);
2801+void devlink_trap_groups_unregister(struct devlink *devlink,
2802+ const struct devlink_trap_group *groups,
2803+ size_t groups_count);
2804+int
2805+devlink_trap_policers_register(struct devlink *devlink,
2806+ const struct devlink_trap_policer *policers,
2807+ size_t policers_count);
2808+void
2809+devlink_trap_policers_unregister(struct devlink *devlink,
2810+ const struct devlink_trap_policer *policers,
2811+ size_t policers_count);
2812
2813 #if IS_ENABLED(CONFIG_NET_DEVLINK)
2814
2815diff --git a/include/net/drop_monitor.h b/include/net/drop_monitor.h
2816deleted file mode 100644
2817index f68bc37..0000000
2818--- a/include/net/drop_monitor.h
2819+++ /dev/null
2820@@ -1,33 +0,0 @@
2821-/* SPDX-License-Identifier: GPL-2.0-only */
2822-
2823-#ifndef _NET_DROP_MONITOR_H_
2824-#define _NET_DROP_MONITOR_H_
2825-
2826-#include <linux/ktime.h>
2827-#include <linux/netdevice.h>
2828-#include <linux/skbuff.h>
2829-
2830-/**
2831- * struct net_dm_hw_metadata - Hardware-supplied packet metadata.
2832- * @trap_group_name: Hardware trap group name.
2833- * @trap_name: Hardware trap name.
2834- * @input_dev: Input netdevice.
2835- */
2836-struct net_dm_hw_metadata {
2837- const char *trap_group_name;
2838- const char *trap_name;
2839- struct net_device *input_dev;
2840-};
2841-
2842-#if IS_REACHABLE(CONFIG_NET_DROP_MONITOR)
2843-void net_dm_hw_report(struct sk_buff *skb,
2844- const struct net_dm_hw_metadata *hw_metadata);
2845-#else
2846-static inline void
2847-net_dm_hw_report(struct sk_buff *skb,
2848- const struct net_dm_hw_metadata *hw_metadata)
2849-{
2850-}
2851-#endif
2852-
2853-#endif /* _NET_DROP_MONITOR_H_ */
2854diff --git a/include/net/genetlink.h b/include/net/genetlink.h
2855index 2d9e67a..1f0f363 100644
2856--- a/include/net/genetlink.h
2857+++ b/include/net/genetlink.h
2858@@ -49,8 +49,11 @@ struct genl_family {
2859 char name[GENL_NAMSIZ];
2860 unsigned int version;
2861 unsigned int maxattr;
2862- bool netnsok;
2863- bool parallel_ops;
2864+ unsigned int mcgrp_offset; /* private */
2865+ u8 netnsok:1;
2866+ u8 parallel_ops:1;
2867+ u8 n_ops;
2868+ u8 n_mcgrps;
2869 const struct nla_policy *policy;
2870 int (*pre_doit)(const struct genl_ops *ops,
2871 struct sk_buff *skb,
2872@@ -61,14 +64,9 @@ struct genl_family {
2873 struct nlattr ** attrbuf; /* private */
2874 const struct genl_ops * ops;
2875 const struct genl_multicast_group *mcgrps;
2876- unsigned int n_ops;
2877- unsigned int n_mcgrps;
2878- unsigned int mcgrp_offset; /* private */
2879 struct module *module;
2880 };
2881
2882-struct nlattr **genl_family_attrbuf(const struct genl_family *family);
2883-
2884 /**
2885 * struct genl_info - receiving information
2886 * @snd_seq: sending sequence number
2887@@ -120,6 +118,24 @@ enum genl_validate_flags {
2888 };
2889
2890 /**
2891+ * struct genl_info - info that is available during dumpit op call
2892+ * @family: generic netlink family - for internal genl code usage
2893+ * @ops: generic netlink ops - for internal genl code usage
2894+ * @attrs: netlink attributes
2895+ */
2896+struct genl_dumpit_info {
2897+ const struct genl_family *family;
2898+ const struct genl_ops *ops;
2899+ struct nlattr **attrs;
2900+};
2901+
2902+static inline const struct genl_dumpit_info *
2903+genl_dumpit_info(struct netlink_callback *cb)
2904+{
2905+ return cb->data;
2906+}
2907+
2908+/**
2909 * struct genl_ops - generic netlink operations
2910 * @cmd: command identifier
2911 * @internal_flags: flags used by the family
2912diff --git a/include/net/netlink.h b/include/net/netlink.h
2913index 22dd99b..f91d2e3 100644
2914--- a/include/net/netlink.h
2915+++ b/include/net/netlink.h
2916@@ -219,7 +219,7 @@ enum nla_policy_validation {
2917 * NLA_NESTED,
2918 * NLA_NESTED_ARRAY Length verification is done by checking len of
2919 * nested header (or empty); len field is used if
2920- * validation_data is also used, for the max attr
2921+ * nested_policy is also used, for the max attr
2922 * number in the nested policy.
2923 * NLA_U8, NLA_U16,
2924 * NLA_U32, NLA_U64,
2925@@ -237,27 +237,25 @@ enum nla_policy_validation {
2926 * NLA_MIN_LEN Minimum length of attribute payload
2927 * All other Minimum length of attribute payload
2928 *
2929- * Meaning of `validation_data' field:
2930+ * Meaning of validation union:
2931 * NLA_BITFIELD32 This is a 32-bit bitmap/bitselector attribute and
2932- * validation data must point to a u32 value of valid
2933- * flags
2934- * NLA_REJECT This attribute is always rejected and validation data
2935+ * `bitfield32_valid' is the u32 value of valid flags
2936+ * NLA_REJECT This attribute is always rejected and `reject_message'
2937 * may point to a string to report as the error instead
2938 * of the generic one in extended ACK.
2939- * NLA_NESTED Points to a nested policy to validate, must also set
2940- * `len' to the max attribute number.
2941+ * NLA_NESTED `nested_policy' to a nested policy to validate, must
2942+ * also set `len' to the max attribute number. Use the
2943+ * provided NLA_POLICY_NESTED() macro.
2944 * Note that nla_parse() will validate, but of course not
2945 * parse, the nested sub-policies.
2946- * NLA_NESTED_ARRAY Points to a nested policy to validate, must also set
2947- * `len' to the max attribute number. The difference to
2948- * NLA_NESTED is the structure - NLA_NESTED has the
2949- * nested attributes directly inside, while an array has
2950- * the nested attributes at another level down and the
2951- * attributes directly in the nesting don't matter.
2952- * All other Unused - but note that it's a union
2953- *
2954- * Meaning of `min' and `max' fields, use via NLA_POLICY_MIN, NLA_POLICY_MAX
2955- * and NLA_POLICY_RANGE:
2956+ * NLA_NESTED_ARRAY `nested_policy' points to a nested policy to validate,
2957+ * must also set `len' to the max attribute number. Use
2958+ * the provided NLA_POLICY_NESTED_ARRAY() macro.
2959+ * The difference to NLA_NESTED is the structure:
2960+ * NLA_NESTED has the nested attributes directly inside
2961+ * while an array has the nested attributes at another
2962+ * level down and the attribute types directly in the
2963+ * nesting don't matter.
2964 * NLA_U8,
2965 * NLA_U16,
2966 * NLA_U32,
2967@@ -265,29 +263,31 @@ enum nla_policy_validation {
2968 * NLA_S8,
2969 * NLA_S16,
2970 * NLA_S32,
2971- * NLA_S64 These are used depending on the validation_type
2972- * field, if that is min/max/range then the minimum,
2973- * maximum and both are used (respectively) to check
2974+ * NLA_S64 The `min' and `max' fields are used depending on the
2975+ * validation_type field, if that is min/max/range then
2976+ * the min, max or both are used (respectively) to check
2977 * the value of the integer attribute.
2978 * Note that in the interest of code simplicity and
2979 * struct size both limits are s16, so you cannot
2980 * enforce a range that doesn't fall within the range
2981 * of s16 - do that as usual in the code instead.
2982+ * Use the NLA_POLICY_MIN(), NLA_POLICY_MAX() and
2983+ * NLA_POLICY_RANGE() macros.
2984 * All other Unused - but note that it's a union
2985 *
2986 * Meaning of `validate' field, use via NLA_POLICY_VALIDATE_FN:
2987- * NLA_BINARY Validation function called for the attribute,
2988- * not compatible with use of the validation_data
2989- * as in NLA_BITFIELD32, NLA_REJECT, NLA_NESTED and
2990- * NLA_NESTED_ARRAY.
2991+ * NLA_BINARY Validation function called for the attribute.
2992 * All other Unused - but note that it's a union
2993 *
2994 * Example:
2995+ *
2996+ * static const u32 myvalidflags = 0xff231023;
2997+ *
2998 * static const struct nla_policy my_policy[ATTR_MAX+1] = {
2999 * [ATTR_FOO] = { .type = NLA_U16 },
3000 * [ATTR_BAR] = { .type = NLA_STRING, .len = BARSIZ },
3001 * [ATTR_BAZ] = { .type = NLA_EXACT_LEN, .len = sizeof(struct mystruct) },
3002- * [ATTR_GOO] = { .type = NLA_BITFIELD32, .validation_data = &myvalidflags },
3003+ * [ATTR_GOO] = NLA_POLICY_BITFIELD32(myvalidflags),
3004 * };
3005 */
3006 struct nla_policy {
3007@@ -295,8 +295,10 @@ struct nla_policy {
3008 u8 validation_type;
3009 u16 len;
3010 union {
3011- const void *validation_data;
3012 const u32 mask;
3013+ const u32 bitfield32_valid;
3014+ const char *reject_message;
3015+ const struct nla_policy *nested_policy;
3016 struct {
3017 s16 min, max;
3018 };
3019@@ -332,13 +334,15 @@ struct nla_policy {
3020 #define NLA_POLICY_ETH_ADDR_COMPAT NLA_POLICY_EXACT_LEN_WARN(ETH_ALEN)
3021
3022 #define _NLA_POLICY_NESTED(maxattr, policy) \
3023- { .type = NLA_NESTED, .validation_data = policy, .len = maxattr }
3024+ { .type = NLA_NESTED, .nested_policy = policy, .len = maxattr }
3025 #define _NLA_POLICY_NESTED_ARRAY(maxattr, policy) \
3026- { .type = NLA_NESTED_ARRAY, .validation_data = policy, .len = maxattr }
3027+ { .type = NLA_NESTED_ARRAY, .nested_policy = policy, .len = maxattr }
3028 #define NLA_POLICY_NESTED(policy) \
3029 _NLA_POLICY_NESTED(ARRAY_SIZE(policy) - 1, policy)
3030 #define NLA_POLICY_NESTED_ARRAY(policy) \
3031 _NLA_POLICY_NESTED_ARRAY(ARRAY_SIZE(policy) - 1, policy)
3032+#define NLA_POLICY_BITFIELD32(valid) \
3033+ { .type = NLA_BITFIELD32, .bitfield32_valid = valid }
3034
3035 #define __NLA_IS_UINT_TYPE(tp) \
3036 (tp == NLA_U8 || tp == NLA_U16 || tp == NLA_U32 || tp == NLA_U64)
3037@@ -1482,6 +1486,21 @@ static inline int nla_put_in6_addr(struct sk_buff *skb, int attrtype,
3038 }
3039
3040 /**
3041+ * nla_put_bitfield32 - Add a bitfield32 netlink attribute to a socket buffer
3042+ * @skb: socket buffer to add attribute to
3043+ * @attrtype: attribute type
3044+ * @value: value carrying bits
3045+ * @selector: selector of valid bits
3046+ */
3047+static inline int nla_put_bitfield32(struct sk_buff *skb, int attrtype,
3048+ __u32 value, __u32 selector)
3049+{
3050+ struct nla_bitfield32 tmp = { value, selector, };
3051+
3052+ return nla_put(skb, attrtype, sizeof(tmp), &tmp);
3053+}
3054+
3055+/**
3056 * nla_get_u32 - return payload of u32 attribute
3057 * @nla: u32 netlink attribute
3058 */
3059diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
3060index 0aac605..d23d7fe 100644
3061--- a/include/net/pkt_cls.h
3062+++ b/include/net/pkt_cls.h
3063@@ -751,6 +751,7 @@ struct tc_red_qopt_offload_params {
3064 u32 limit;
3065 bool is_ecn;
3066 bool is_harddrop;
3067+ bool is_nodrop;
3068 struct gnet_stats_queue *qstats;
3069 };
3070
3071diff --git a/include/net/red.h b/include/net/red.h
3072index ff07a7c..cc9f6b0 100644
3073--- a/include/net/red.h
3074+++ b/include/net/red.h
3075@@ -189,6 +189,44 @@ static inline bool red_check_params(u32 qth_min, u32 qth_max, u8 Wlog,
3076 return true;
3077 }
3078
3079+static inline int red_get_flags(unsigned char qopt_flags,
3080+ unsigned char historic_mask,
3081+ struct nlattr *flags_attr,
3082+ unsigned char supported_mask,
3083+ struct nla_bitfield32 *p_flags,
3084+ unsigned char *p_userbits,
3085+ struct netlink_ext_ack *extack)
3086+{
3087+ struct nla_bitfield32 flags;
3088+
3089+ if (qopt_flags && flags_attr) {
3090+ NL_SET_ERR_MSG_MOD(extack, "flags should be passed either through qopt, or through a dedicated attribute");
3091+ return -EINVAL;
3092+ }
3093+
3094+ if (flags_attr) {
3095+ flags = nla_get_bitfield32(flags_attr);
3096+ } else {
3097+ flags.selector = historic_mask;
3098+ flags.value = qopt_flags & historic_mask;
3099+ }
3100+
3101+ *p_flags = flags;
3102+ *p_userbits = qopt_flags & ~historic_mask;
3103+ return 0;
3104+}
3105+
3106+static inline int red_validate_flags(unsigned char flags,
3107+ struct netlink_ext_ack *extack)
3108+{
3109+ if ((flags & TC_RED_NODROP) && !(flags & TC_RED_ECN)) {
3110+ NL_SET_ERR_MSG_MOD(extack, "nodrop mode is only meaningful with ECN");
3111+ return -EINVAL;
3112+ }
3113+
3114+ return 0;
3115+}
3116+
3117 static inline void red_set_parms(struct red_parms *p,
3118 u32 qth_min, u32 qth_max, u8 Wlog, u8 Plog,
3119 u8 Scell_log, u8 *stab, u32 max_P)
3120diff --git a/include/trace/events/devlink.h b/include/trace/events/devlink.h
3121index 6f60a78..44d8e29 100644
3122--- a/include/trace/events/devlink.h
3123+++ b/include/trace/events/devlink.h
3124@@ -171,6 +171,43 @@ TRACE_EVENT(devlink_health_reporter_state_update,
3125 __entry->new_state)
3126 );
3127
3128+/*
3129+ * Tracepoint for devlink packet trap:
3130+ */
3131+TRACE_EVENT(devlink_trap_report,
3132+ TP_PROTO(const struct devlink *devlink, struct sk_buff *skb,
3133+ const struct devlink_trap_metadata *metadata),
3134+
3135+ TP_ARGS(devlink, skb, metadata),
3136+
3137+ TP_STRUCT__entry(
3138+ __string(bus_name, devlink->dev->bus->name)
3139+ __string(dev_name, dev_name(devlink->dev))
3140+ __string(driver_name, devlink->dev->driver->name)
3141+ __string(trap_name, metadata->trap_name)
3142+ __string(trap_group_name, metadata->trap_group_name)
3143+ __dynamic_array(char, input_dev_name, IFNAMSIZ)
3144+ ),
3145+
3146+ TP_fast_assign(
3147+ struct net_device *input_dev = metadata->input_dev;
3148+
3149+ __assign_str(bus_name, devlink->dev->bus->name);
3150+ __assign_str(dev_name, dev_name(devlink->dev));
3151+ __assign_str(driver_name, devlink->dev->driver->name);
3152+ __assign_str(trap_name, metadata->trap_name);
3153+ __assign_str(trap_group_name, metadata->trap_group_name);
3154+ __assign_str(input_dev_name,
3155+ (input_dev ? input_dev->name : "NULL"));
3156+ ),
3157+
3158+ TP_printk("bus_name=%s dev_name=%s driver_name=%s trap_name=%s "
3159+ "trap_group_name=%s input_dev_name=%s", __get_str(bus_name),
3160+ __get_str(dev_name), __get_str(driver_name),
3161+ __get_str(trap_name), __get_str(trap_group_name),
3162+ __get_str(input_dev_name))
3163+);
3164+
3165 #endif /* _TRACE_DEVLINK_H */
3166
3167 /* This part must be outside protection */
3168diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
3169index a8a2174..7896b44 100644
3170--- a/include/uapi/linux/devlink.h
3171+++ b/include/uapi/linux/devlink.h
3172@@ -117,6 +117,18 @@ enum devlink_command {
3173 DEVLINK_CMD_TRAP_GROUP_NEW,
3174 DEVLINK_CMD_TRAP_GROUP_DEL,
3175
3176+ DEVLINK_CMD_TRAP_POLICER_GET, /* can dump */
3177+ DEVLINK_CMD_TRAP_POLICER_SET,
3178+ DEVLINK_CMD_TRAP_POLICER_NEW,
3179+ DEVLINK_CMD_TRAP_POLICER_DEL,
3180+
3181+ DEVLINK_CMD_HEALTH_REPORTER_TEST,
3182+
3183+ DEVLINK_CMD_RATE_GET, /* can dump */
3184+ DEVLINK_CMD_RATE_SET,
3185+ DEVLINK_CMD_RATE_NEW,
3186+ DEVLINK_CMD_RATE_DEL,
3187+
3188 /* add new commands above here */
3189 __DEVLINK_CMD_MAX,
3190 DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
3191@@ -187,6 +199,19 @@ enum devlink_port_flavour {
3192 * for the PCI VF. It is an internal
3193 * port that faces the PCI VF.
3194 */
3195+ DEVLINK_PORT_FLAVOUR_VIRTUAL, /* Any virtual port facing the user. */
3196+ DEVLINK_PORT_FLAVOUR_UNUSED, /* Port which exists in the switch, but
3197+ * is not used in any way.
3198+ */
3199+ DEVLINK_PORT_FLAVOUR_PCI_SF, /* Represents eswitch port
3200+ * for the PCI SF. It is an internal
3201+ * port that faces the PCI SF.
3202+ */
3203+};
3204+
3205+enum devlink_rate_type {
3206+ DEVLINK_RATE_TYPE_LEAF,
3207+ DEVLINK_RATE_TYPE_NODE,
3208 };
3209
3210 enum devlink_param_cmode {
3211@@ -216,11 +241,34 @@ enum devlink_param_reset_dev_on_drv_probe_value {
3212 enum {
3213 DEVLINK_ATTR_STATS_RX_PACKETS, /* u64 */
3214 DEVLINK_ATTR_STATS_RX_BYTES, /* u64 */
3215+ DEVLINK_ATTR_STATS_RX_DROPPED, /* u64 */
3216
3217 __DEVLINK_ATTR_STATS_MAX,
3218 DEVLINK_ATTR_STATS_MAX = __DEVLINK_ATTR_STATS_MAX - 1
3219 };
3220
3221+/* Specify what sections of a flash component can be overwritten when
3222+ * performing an update. Overwriting of firmware binary sections is always
3223+ * implicitly assumed to be allowed.
3224+ *
3225+ * Each section must be documented in
3226+ * Documentation/networking/devlink/devlink-flash.rst
3227+ *
3228+ */
3229+enum {
3230+ DEVLINK_FLASH_OVERWRITE_SETTINGS_BIT,
3231+ DEVLINK_FLASH_OVERWRITE_IDENTIFIERS_BIT,
3232+
3233+ __DEVLINK_FLASH_OVERWRITE_MAX_BIT,
3234+ DEVLINK_FLASH_OVERWRITE_MAX_BIT = __DEVLINK_FLASH_OVERWRITE_MAX_BIT - 1
3235+};
3236+
3237+#define DEVLINK_FLASH_OVERWRITE_SETTINGS _BITUL(DEVLINK_FLASH_OVERWRITE_SETTINGS_BIT)
3238+#define DEVLINK_FLASH_OVERWRITE_IDENTIFIERS _BITUL(DEVLINK_FLASH_OVERWRITE_IDENTIFIERS_BIT)
3239+
3240+#define DEVLINK_SUPPORTED_FLASH_OVERWRITE_SECTIONS \
3241+ (_BITUL(__DEVLINK_FLASH_OVERWRITE_MAX_BIT) - 1)
3242+
3243 /**
3244 * enum devlink_trap_action - Packet trap action.
3245 * @DEVLINK_TRAP_ACTION_DROP: Packet is dropped by the device and a copy is not
3246@@ -243,17 +291,48 @@ enum devlink_trap_action {
3247 * control plane for resolution. Trapped packets
3248 * are processed by devlink and injected to
3249 * the kernel's Rx path.
3250+ * @DEVLINK_TRAP_TYPE_CONTROL: Packet was trapped because it is required for
3251+ * the correct functioning of the control plane.
3252+ * For example, an ARP request packet. Trapped
3253+ * packets are injected to the kernel's Rx path,
3254+ * but not reported to drop monitor.
3255 */
3256 enum devlink_trap_type {
3257 DEVLINK_TRAP_TYPE_DROP,
3258 DEVLINK_TRAP_TYPE_EXCEPTION,
3259+ DEVLINK_TRAP_TYPE_CONTROL,
3260 };
3261
3262 enum {
3263 /* Trap can report input port as metadata */
3264 DEVLINK_ATTR_TRAP_METADATA_TYPE_IN_PORT,
3265+ /* Trap can report flow action cookie as metadata */
3266+ DEVLINK_ATTR_TRAP_METADATA_TYPE_FA_COOKIE,
3267 };
3268
3269+enum devlink_reload_action {
3270+ DEVLINK_RELOAD_ACTION_UNSPEC,
3271+ DEVLINK_RELOAD_ACTION_DRIVER_REINIT, /* Driver entities re-instantiation */
3272+ DEVLINK_RELOAD_ACTION_FW_ACTIVATE, /* FW activate */
3273+
3274+ /* Add new reload actions above */
3275+ __DEVLINK_RELOAD_ACTION_MAX,
3276+ DEVLINK_RELOAD_ACTION_MAX = __DEVLINK_RELOAD_ACTION_MAX - 1
3277+};
3278+
3279+enum devlink_reload_limit {
3280+ DEVLINK_RELOAD_LIMIT_UNSPEC, /* unspecified, no constraints */
3281+ DEVLINK_RELOAD_LIMIT_NO_RESET, /* No reset allowed, no down time allowed,
3282+ * no link flap and no configuration is lost.
3283+ */
3284+
3285+ /* Add new reload limit above */
3286+ __DEVLINK_RELOAD_LIMIT_MAX,
3287+ DEVLINK_RELOAD_LIMIT_MAX = __DEVLINK_RELOAD_LIMIT_MAX - 1
3288+};
3289+
3290+#define DEVLINK_RELOAD_LIMITS_VALID_MASK (BIT(__DEVLINK_RELOAD_LIMIT_MAX) - 1)
3291+
3292 enum devlink_attr {
3293 /* don't change the order or add anything between, this is ABI! */
3294 DEVLINK_ATTR_UNSPEC,
3295@@ -422,6 +501,51 @@ enum devlink_attr {
3296 DEVLINK_ATTR_RELOAD_FAILED, /* u8 0 or 1 */
3297
3298 DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS_NS, /* u64 */
3299+
3300+ DEVLINK_ATTR_NETNS_FD, /* u32 */
3301+ DEVLINK_ATTR_NETNS_PID, /* u32 */
3302+ DEVLINK_ATTR_NETNS_ID, /* u32 */
3303+
3304+ DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP, /* u8 */
3305+
3306+ DEVLINK_ATTR_TRAP_POLICER_ID, /* u32 */
3307+ DEVLINK_ATTR_TRAP_POLICER_RATE, /* u64 */
3308+ DEVLINK_ATTR_TRAP_POLICER_BURST, /* u64 */
3309+
3310+ DEVLINK_ATTR_PORT_FUNCTION, /* nested */
3311+
3312+ DEVLINK_ATTR_INFO_BOARD_SERIAL_NUMBER, /* string */
3313+
3314+ DEVLINK_ATTR_PORT_LANES, /* u32 */
3315+ DEVLINK_ATTR_PORT_SPLITTABLE, /* u8 */
3316+
3317+ DEVLINK_ATTR_PORT_EXTERNAL, /* u8 */
3318+ DEVLINK_ATTR_PORT_CONTROLLER_NUMBER, /* u32 */
3319+
3320+ DEVLINK_ATTR_FLASH_UPDATE_STATUS_TIMEOUT, /* u64 */
3321+ DEVLINK_ATTR_FLASH_UPDATE_OVERWRITE_MASK, /* bitfield32 */
3322+
3323+ DEVLINK_ATTR_RELOAD_ACTION, /* u8 */
3324+ DEVLINK_ATTR_RELOAD_ACTIONS_PERFORMED, /* bitfield32 */
3325+ DEVLINK_ATTR_RELOAD_LIMITS, /* bitfield32 */
3326+
3327+ DEVLINK_ATTR_DEV_STATS, /* nested */
3328+ DEVLINK_ATTR_RELOAD_STATS, /* nested */
3329+ DEVLINK_ATTR_RELOAD_STATS_ENTRY, /* nested */
3330+ DEVLINK_ATTR_RELOAD_STATS_LIMIT, /* u8 */
3331+ DEVLINK_ATTR_RELOAD_STATS_VALUE, /* u32 */
3332+ DEVLINK_ATTR_REMOTE_RELOAD_STATS, /* nested */
3333+ DEVLINK_ATTR_RELOAD_ACTION_INFO, /* nested */
3334+ DEVLINK_ATTR_RELOAD_ACTION_STATS, /* nested */
3335+
3336+ DEVLINK_ATTR_PORT_PCI_SF_NUMBER, /* u32 */
3337+
3338+ DEVLINK_ATTR_RATE_TYPE, /* u16 */
3339+ DEVLINK_ATTR_RATE_TX_SHARE, /* u64 */
3340+ DEVLINK_ATTR_RATE_TX_MAX, /* u64 */
3341+ DEVLINK_ATTR_RATE_NODE_NAME, /* string */
3342+ DEVLINK_ATTR_RATE_PARENT_NODE_NAME, /* string */
3343+
3344 /* add new attributes above here, update the policy in devlink.c */
3345
3346 __DEVLINK_ATTR_MAX,
3347@@ -468,4 +592,32 @@ enum devlink_resource_unit {
3348 DEVLINK_RESOURCE_UNIT_ENTRY,
3349 };
3350
3351+enum devlink_port_function_attr {
3352+ DEVLINK_PORT_FUNCTION_ATTR_UNSPEC,
3353+ DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR, /* binary */
3354+ DEVLINK_PORT_FN_ATTR_STATE, /* u8 */
3355+ DEVLINK_PORT_FN_ATTR_OPSTATE, /* u8 */
3356+
3357+ __DEVLINK_PORT_FUNCTION_ATTR_MAX,
3358+ DEVLINK_PORT_FUNCTION_ATTR_MAX = __DEVLINK_PORT_FUNCTION_ATTR_MAX - 1
3359+};
3360+
3361+enum devlink_port_fn_state {
3362+ DEVLINK_PORT_FN_STATE_INACTIVE,
3363+ DEVLINK_PORT_FN_STATE_ACTIVE,
3364+};
3365+
3366+/**
3367+ * enum devlink_port_fn_opstate - indicates operational state of the function
3368+ * @DEVLINK_PORT_FN_OPSTATE_ATTACHED: Driver is attached to the function.
3369+ * For graceful tear down of the function, after inactivation of the
3370+ * function, user should wait for operational state to turn DETACHED.
3371+ * @DEVLINK_PORT_FN_OPSTATE_DETACHED: Driver is detached from the function.
3372+ * It is safe to delete the port.
3373+ */
3374+enum devlink_port_fn_opstate {
3375+ DEVLINK_PORT_FN_OPSTATE_DETACHED,
3376+ DEVLINK_PORT_FN_OPSTATE_ATTACHED,
3377+};
3378+
3379 #endif /* _UAPI_LINUX_DEVLINK_H_ */
3380diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
3381index 4a245d7..0070ef1 100644
3382--- a/include/uapi/linux/pkt_sched.h
3383+++ b/include/uapi/linux/pkt_sched.h
3384@@ -256,6 +256,7 @@ enum {
3385 TCA_RED_PARMS,
3386 TCA_RED_STAB,
3387 TCA_RED_MAX_P,
3388+ TCA_RED_FLAGS, /* bitfield32 */
3389 __TCA_RED_MAX,
3390 };
3391
3392@@ -268,12 +269,28 @@ struct tc_red_qopt {
3393 unsigned char Wlog; /* log(W) */
3394 unsigned char Plog; /* log(P_max/(qth_max-qth_min)) */
3395 unsigned char Scell_log; /* cell size for idle damping */
3396+
3397+ /* This field can be used for flags that a RED-like qdisc has
3398+ * historically supported. E.g. when configuring RED, it can be used for
3399+ * ECN, HARDDROP and ADAPTATIVE. For SFQ it can be used for ECN,
3400+ * HARDDROP. Etc. Because this field has not been validated, and is
3401+ * copied back on dump, any bits besides those to which a given qdisc
3402+ * has assigned a historical meaning need to be considered for free use
3403+ * by userspace tools.
3404+ *
3405+ * Any further flags need to be passed differently, e.g. through an
3406+ * attribute (such as TCA_RED_FLAGS above). Such attribute should allow
3407+ * passing both recent and historic flags in one value.
3408+ */
3409 unsigned char flags;
3410 #define TC_RED_ECN 1
3411 #define TC_RED_HARDDROP 2
3412 #define TC_RED_ADAPTATIVE 4
3413+#define TC_RED_NODROP 8
3414 };
3415
3416+#define TC_RED_HISTORIC_FLAGS (TC_RED_ECN | TC_RED_HARDDROP | TC_RED_ADAPTATIVE)
3417+
3418 struct tc_red_xstats {
3419 __u32 early; /* Early drops */
3420 __u32 pdrop; /* Drops due to queue limits */
3421diff --git a/lib/nlattr.c b/lib/nlattr.c
3422index c94b014..b61f3a5 100644
3423--- a/lib/nlattr.c
3424+++ b/lib/nlattr.c
3425@@ -45,7 +45,7 @@ static const u8 nla_attr_minlen[NLA_TYPE_MAX+1] = {
3426 };
3427
3428 static int validate_nla_bitfield32(const struct nlattr *nla,
3429- const u32 *valid_flags_mask)
3430+ const u32 valid_flags_mask)
3431 {
3432 const struct nla_bitfield32 *bf = nla_data(nla);
3433
3434@@ -53,11 +53,11 @@ static int validate_nla_bitfield32(const struct nlattr *nla,
3435 return -EINVAL;
3436
3437 /*disallow invalid bit selector */
3438- if (bf->selector & ~*valid_flags_mask)
3439+ if (bf->selector & ~valid_flags_mask)
3440 return -EINVAL;
3441
3442 /*disallow invalid bit values */
3443- if (bf->value & ~*valid_flags_mask)
3444+ if (bf->value & ~valid_flags_mask)
3445 return -EINVAL;
3446
3447 /*disallow valid bit values that are not selected*/
3448@@ -237,9 +237,9 @@ static int validate_nla(const struct nlattr *nla, int maxtype,
3449 break;
3450
3451 case NLA_REJECT:
3452- if (extack && pt->validation_data) {
3453+ if (extack && pt->reject_message) {
3454 NL_SET_BAD_ATTR(extack, nla);
3455- extack->_msg = pt->validation_data;
3456+ extack->_msg = pt->reject_message;
3457 return -EINVAL;
3458 }
3459 err = -EINVAL;
3460@@ -254,7 +254,7 @@ static int validate_nla(const struct nlattr *nla, int maxtype,
3461 if (attrlen != sizeof(struct nla_bitfield32))
3462 goto out_err;
3463
3464- err = validate_nla_bitfield32(nla, pt->validation_data);
3465+ err = validate_nla_bitfield32(nla, pt->bitfield32_valid);
3466 if (err)
3467 goto out_err;
3468 break;
3469@@ -299,9 +299,9 @@ static int validate_nla(const struct nlattr *nla, int maxtype,
3470 break;
3471 if (attrlen < NLA_HDRLEN)
3472 goto out_err;
3473- if (pt->validation_data) {
3474+ if (pt->nested_policy) {
3475 err = __nla_validate(nla_data(nla), nla_len(nla), pt->len,
3476- pt->validation_data, validate,
3477+ pt->nested_policy, validate,
3478 extack);
3479 if (err < 0) {
3480 /*
3481@@ -320,11 +320,11 @@ static int validate_nla(const struct nlattr *nla, int maxtype,
3482 break;
3483 if (attrlen < NLA_HDRLEN)
3484 goto out_err;
3485- if (pt->validation_data) {
3486+ if (pt->nested_policy) {
3487 int err;
3488
3489 err = nla_validate_array(nla_data(nla), nla_len(nla),
3490- pt->len, pt->validation_data,
3491+ pt->len, pt->nested_policy,
3492 extack, validate);
3493 if (err < 0) {
3494 /*
3495diff --git a/net/Kconfig b/net/Kconfig
3496index 0b2fecc..5aa30bc 100644
3497--- a/net/Kconfig
3498+++ b/net/Kconfig
3499@@ -433,7 +433,6 @@ config NET_SOCK_MSG
3500 config NET_DEVLINK
3501 bool
3502 default n
3503- imply NET_DROP_MONITOR
3504
3505 config PAGE_POOL
3506 bool
3507diff --git a/net/core/devlink.c b/net/core/devlink.c
3508index 09eb3e6..0ee0fb6 100644
3509--- a/net/core/devlink.c
3510+++ b/net/core/devlink.c
3511@@ -27,7 +27,6 @@
3512 #include <net/net_namespace.h>
3513 #include <net/sock.h>
3514 #include <net/devlink.h>
3515-#include <net/drop_monitor.h>
3516 #define CREATE_TRACE_POINTS
3517 #include <trace/events/devlink.h>
3518
3519@@ -84,6 +83,14 @@ EXPORT_SYMBOL(devlink_dpipe_header_ipv6);
3520
3521 EXPORT_TRACEPOINT_SYMBOL_GPL(devlink_hwmsg);
3522 EXPORT_TRACEPOINT_SYMBOL_GPL(devlink_hwerr);
3523+EXPORT_TRACEPOINT_SYMBOL_GPL(devlink_trap_report);
3524+
3525+static const struct nla_policy devlink_function_nl_policy[DEVLINK_PORT_FUNCTION_ATTR_MAX + 1] = {
3526+ [DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR] = { .type = NLA_BINARY },
3527+ [DEVLINK_PORT_FN_ATTR_STATE] =
3528+ NLA_POLICY_RANGE(NLA_U8, DEVLINK_PORT_FN_STATE_INACTIVE,
3529+ DEVLINK_PORT_FN_STATE_ACTIVE),
3530+};
3531
3532 static LIST_HEAD(devlink_list);
3533
3534@@ -95,15 +102,11 @@ static LIST_HEAD(devlink_list);
3535 */
3536 static DEFINE_MUTEX(devlink_mutex);
3537
3538-static struct net *devlink_net(const struct devlink *devlink)
3539+struct net *devlink_net(const struct devlink *devlink)
3540 {
3541 return read_pnet(&devlink->_net);
3542 }
3543-
3544-static void devlink_net_set(struct devlink *devlink, struct net *net)
3545-{
3546- write_pnet(&devlink->_net, net);
3547-}
3548+EXPORT_SYMBOL_GPL(devlink_net);
3549
3550 static struct devlink *devlink_get_from_attrs(struct net *net,
3551 struct nlattr **attrs)
3552@@ -174,6 +177,80 @@ static struct devlink_port *devlink_port_get_from_info(struct devlink *devlink,
3553 return devlink_port_get_from_attrs(devlink, info->attrs);
3554 }
3555
3556+static inline bool
3557+devlink_rate_is_leaf(struct devlink_rate *devlink_rate)
3558+{
3559+ return devlink_rate->type == DEVLINK_RATE_TYPE_LEAF;
3560+}
3561+
3562+static inline bool
3563+devlink_rate_is_node(struct devlink_rate *devlink_rate)
3564+{
3565+ return devlink_rate->type == DEVLINK_RATE_TYPE_NODE;
3566+}
3567+
3568+static struct devlink_rate *
3569+devlink_rate_leaf_get_from_info(struct devlink *devlink, struct genl_info *info)
3570+{
3571+ struct devlink_rate *devlink_rate;
3572+ struct devlink_port *devlink_port;
3573+
3574+ devlink_port = devlink_port_get_from_attrs(devlink, info->attrs);
3575+ if (IS_ERR(devlink_port))
3576+ return ERR_CAST(devlink_port);
3577+ devlink_rate = devlink_port->devlink_rate;
3578+ return devlink_rate ?: ERR_PTR(-ENODEV);
3579+}
3580+
3581+static struct devlink_rate *
3582+devlink_rate_node_get_by_name(struct devlink *devlink, const char *node_name)
3583+{
3584+ static struct devlink_rate *devlink_rate;
3585+
3586+ list_for_each_entry(devlink_rate, &devlink->rate_list, list) {
3587+ if (devlink_rate_is_node(devlink_rate) &&
3588+ !strcmp(node_name, devlink_rate->name))
3589+ return devlink_rate;
3590+ }
3591+ return ERR_PTR(-ENODEV);
3592+}
3593+
3594+static struct devlink_rate *
3595+devlink_rate_node_get_from_attrs(struct devlink *devlink, struct nlattr **attrs)
3596+{
3597+ const char *rate_node_name;
3598+ size_t len;
3599+
3600+ if (!attrs[DEVLINK_ATTR_RATE_NODE_NAME])
3601+ return ERR_PTR(-EINVAL);
3602+ rate_node_name = nla_data(attrs[DEVLINK_ATTR_RATE_NODE_NAME]);
3603+ len = strlen(rate_node_name);
3604+ /* Name cannot be empty or decimal number */
3605+ if (!len || strspn(rate_node_name, "0123456789") == len)
3606+ return ERR_PTR(-EINVAL);
3607+
3608+ return devlink_rate_node_get_by_name(devlink, rate_node_name);
3609+}
3610+
3611+static struct devlink_rate *
3612+devlink_rate_node_get_from_info(struct devlink *devlink, struct genl_info *info)
3613+{
3614+ return devlink_rate_node_get_from_attrs(devlink, info->attrs);
3615+}
3616+
3617+static struct devlink_rate *
3618+devlink_rate_get_from_info(struct devlink *devlink, struct genl_info *info)
3619+{
3620+ struct nlattr **attrs = info->attrs;
3621+
3622+ if (attrs[DEVLINK_ATTR_PORT_INDEX])
3623+ return devlink_rate_leaf_get_from_info(devlink, info);
3624+ else if (attrs[DEVLINK_ATTR_RATE_NODE_NAME])
3625+ return devlink_rate_node_get_from_info(devlink, info);
3626+ else
3627+ return ERR_PTR(-EINVAL);
3628+}
3629+
3630 struct devlink_sb {
3631 struct list_head list;
3632 unsigned int index;
3633@@ -334,8 +411,12 @@ devlink_sb_tc_index_get_from_info(struct devlink_sb *devlink_sb,
3634
3635 struct devlink_region {
3636 struct devlink *devlink;
3637+ struct devlink_port *port;
3638 struct list_head list;
3639- const char *name;
3640+ union {
3641+ const struct devlink_region_ops *ops;
3642+ const struct devlink_port_region_ops *port_ops;
3643+ };
3644 struct list_head snapshot_list;
3645 u32 max_snapshots;
3646 u32 cur_snapshots;
3647@@ -345,7 +426,6 @@ struct devlink_region {
3648 struct devlink_snapshot {
3649 struct list_head list;
3650 struct devlink_region *region;
3651- devlink_snapshot_data_dest_t *data_destructor;
3652 u8 *data;
3653 u32 id;
3654 };
3655@@ -356,7 +436,20 @@ devlink_region_get_by_name(struct devlink *devlink, const char *region_name)
3656 struct devlink_region *region;
3657
3658 list_for_each_entry(region, &devlink->region_list, list)
3659- if (!strcmp(region->name, region_name))
3660+ if (!strcmp(region->ops->name, region_name))
3661+ return region;
3662+
3663+ return NULL;
3664+}
3665+
3666+static struct devlink_region *
3667+devlink_port_region_get_by_name(struct devlink_port *port,
3668+ const char *region_name)
3669+{
3670+ struct devlink_region *region;
3671+
3672+ list_for_each_entry(region, &port->region_list, list)
3673+ if (!strcmp(region->ops->name, region_name))
3674 return region;
3675
3676 return NULL;
3677@@ -374,19 +467,21 @@ devlink_region_snapshot_get_by_id(struct devlink_region *region, u32 id)
3678 return NULL;
3679 }
3680
3681-#define DEVLINK_NL_FLAG_NEED_DEVLINK BIT(0)
3682-#define DEVLINK_NL_FLAG_NEED_PORT BIT(1)
3683-#define DEVLINK_NL_FLAG_NEED_SB BIT(2)
3684+#define DEVLINK_NL_FLAG_NEED_PORT BIT(0)
3685+#define DEVLINK_NL_FLAG_NEED_DEVLINK_OR_PORT BIT(1)
3686+#define DEVLINK_NL_FLAG_NEED_RATE BIT(2)
3687+#define DEVLINK_NL_FLAG_NEED_RATE_NODE BIT(3)
3688
3689 /* The per devlink instance lock is taken by default in the pre-doit
3690 * operation, yet several commands do not require this. The global
3691 * devlink lock is taken and protects from disruption by user-calls.
3692 */
3693-#define DEVLINK_NL_FLAG_NO_LOCK BIT(3)
3694+#define DEVLINK_NL_FLAG_NO_LOCK BIT(4)
3695
3696 static int devlink_nl_pre_doit(const struct genl_ops *ops,
3697 struct sk_buff *skb, struct genl_info *info)
3698 {
3699+ struct devlink_port *devlink_port;
3700 struct devlink *devlink;
3701 int err;
3702
3703@@ -398,27 +493,36 @@ static int devlink_nl_pre_doit(const struct genl_ops *ops,
3704 }
3705 if (~ops->internal_flags & DEVLINK_NL_FLAG_NO_LOCK)
3706 mutex_lock(&devlink->lock);
3707- if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_DEVLINK) {
3708- info->user_ptr[0] = devlink;
3709- } else if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_PORT) {
3710- struct devlink_port *devlink_port;
3711-
3712+ info->user_ptr[0] = devlink;
3713+ if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_PORT) {
3714 devlink_port = devlink_port_get_from_info(devlink, info);
3715 if (IS_ERR(devlink_port)) {
3716 err = PTR_ERR(devlink_port);
3717 goto unlock;
3718 }
3719- info->user_ptr[0] = devlink_port;
3720- }
3721- if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_SB) {
3722- struct devlink_sb *devlink_sb;
3723+ info->user_ptr[1] = devlink_port;
3724+ } else if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_DEVLINK_OR_PORT) {
3725+ devlink_port = devlink_port_get_from_info(devlink, info);
3726+ if (!IS_ERR(devlink_port))
3727+ info->user_ptr[1] = devlink_port;
3728+ } else if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_RATE) {
3729+ struct devlink_rate *devlink_rate;
3730+
3731+ devlink_rate = devlink_rate_get_from_info(devlink, info);
3732+ if (IS_ERR(devlink_rate)) {
3733+ err = PTR_ERR(devlink_rate);
3734+ goto unlock;
3735+ }
3736+ info->user_ptr[1] = devlink_rate;
3737+ } else if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_RATE_NODE) {
3738+ struct devlink_rate *rate_node;
3739
3740- devlink_sb = devlink_sb_get_from_info(devlink, info);
3741- if (IS_ERR(devlink_sb)) {
3742- err = PTR_ERR(devlink_sb);
3743+ rate_node = devlink_rate_node_get_from_info(devlink, info);
3744+ if (IS_ERR(rate_node)) {
3745+ err = PTR_ERR(rate_node);
3746 goto unlock;
3747 }
3748- info->user_ptr[1] = devlink_sb;
3749+ info->user_ptr[1] = rate_node;
3750 }
3751 return 0;
3752
3753@@ -434,7 +538,7 @@ static void devlink_nl_post_doit(const struct genl_ops *ops,
3754 {
3755 struct devlink *devlink;
3756
3757- devlink = devlink_get_from_info(info);
3758+ devlink = info->user_ptr[0];
3759 if (~ops->internal_flags & DEVLINK_NL_FLAG_NO_LOCK)
3760 mutex_unlock(&devlink->lock);
3761 mutex_unlock(&devlink_mutex);
3762@@ -459,10 +563,132 @@ static int devlink_nl_put_handle(struct sk_buff *msg, struct devlink *devlink)
3763 return 0;
3764 }
3765
3766+struct devlink_reload_combination {
3767+ enum devlink_reload_action action;
3768+ enum devlink_reload_limit limit;
3769+};
3770+
3771+static const struct devlink_reload_combination devlink_reload_invalid_combinations[] = {
3772+ {
3773+ /* can't reinitialize driver with no down time */
3774+ .action = DEVLINK_RELOAD_ACTION_DRIVER_REINIT,
3775+ .limit = DEVLINK_RELOAD_LIMIT_NO_RESET,
3776+ },
3777+};
3778+
3779+static bool
3780+devlink_reload_combination_is_invalid(enum devlink_reload_action action,
3781+ enum devlink_reload_limit limit)
3782+{
3783+ int i;
3784+
3785+ for (i = 0; i < ARRAY_SIZE(devlink_reload_invalid_combinations); i++)
3786+ if (devlink_reload_invalid_combinations[i].action == action &&
3787+ devlink_reload_invalid_combinations[i].limit == limit)
3788+ return true;
3789+ return false;
3790+}
3791+
3792+static bool
3793+devlink_reload_action_is_supported(struct devlink *devlink, enum devlink_reload_action action)
3794+{
3795+ return test_bit(action, &devlink->ops->reload_actions);
3796+}
3797+
3798+static bool
3799+devlink_reload_limit_is_supported(struct devlink *devlink, enum devlink_reload_limit limit)
3800+{
3801+ return test_bit(limit, &devlink->ops->reload_limits);
3802+}
3803+
3804+static int devlink_reload_stat_put(struct sk_buff *msg,
3805+ enum devlink_reload_limit limit, u32 value)
3806+{
3807+ struct nlattr *reload_stats_entry;
3808+
3809+ reload_stats_entry = nla_nest_start(msg, DEVLINK_ATTR_RELOAD_STATS_ENTRY);
3810+ if (!reload_stats_entry)
3811+ return -EMSGSIZE;
3812+
3813+ if (nla_put_u8(msg, DEVLINK_ATTR_RELOAD_STATS_LIMIT, limit) ||
3814+ nla_put_u32(msg, DEVLINK_ATTR_RELOAD_STATS_VALUE, value))
3815+ goto nla_put_failure;
3816+ nla_nest_end(msg, reload_stats_entry);
3817+ return 0;
3818+
3819+nla_put_failure:
3820+ nla_nest_cancel(msg, reload_stats_entry);
3821+ return -EMSGSIZE;
3822+}
3823+
3824+static int devlink_reload_stats_put(struct sk_buff *msg, struct devlink *devlink, bool is_remote)
3825+{
3826+ struct nlattr *reload_stats_attr, *act_info, *act_stats;
3827+ int i, j, stat_idx;
3828+ u32 value;
3829+
3830+ if (!is_remote)
3831+ reload_stats_attr = nla_nest_start(msg, DEVLINK_ATTR_RELOAD_STATS);
3832+ else
3833+ reload_stats_attr = nla_nest_start(msg, DEVLINK_ATTR_REMOTE_RELOAD_STATS);
3834+
3835+ if (!reload_stats_attr)
3836+ return -EMSGSIZE;
3837+
3838+ for (i = 0; i <= DEVLINK_RELOAD_ACTION_MAX; i++) {
3839+ if ((!is_remote &&
3840+ !devlink_reload_action_is_supported(devlink, i)) ||
3841+ i == DEVLINK_RELOAD_ACTION_UNSPEC)
3842+ continue;
3843+ act_info = nla_nest_start(msg, DEVLINK_ATTR_RELOAD_ACTION_INFO);
3844+ if (!act_info)
3845+ goto nla_put_failure;
3846+
3847+ if (nla_put_u8(msg, DEVLINK_ATTR_RELOAD_ACTION, i))
3848+ goto action_info_nest_cancel;
3849+ act_stats = nla_nest_start(msg, DEVLINK_ATTR_RELOAD_ACTION_STATS);
3850+ if (!act_stats)
3851+ goto action_info_nest_cancel;
3852+
3853+ for (j = 0; j <= DEVLINK_RELOAD_LIMIT_MAX; j++) {
3854+ /* Remote stats are shown even if not locally supported.
3855+ * Stats of actions with unspecified limit are shown
3856+ * though drivers don't need to register unspecified
3857+ * limit.
3858+ */
3859+ if ((!is_remote && j != DEVLINK_RELOAD_LIMIT_UNSPEC &&
3860+ !devlink_reload_limit_is_supported(devlink, j)) ||
3861+ devlink_reload_combination_is_invalid(i, j))
3862+ continue;
3863+
3864+ stat_idx = j * __DEVLINK_RELOAD_ACTION_MAX + i;
3865+ if (!is_remote)
3866+ value = devlink->stats.reload_stats[stat_idx];
3867+ else
3868+ value = devlink->stats.remote_reload_stats[stat_idx];
3869+ if (devlink_reload_stat_put(msg, j, value))
3870+ goto action_stats_nest_cancel;
3871+ }
3872+ nla_nest_end(msg, act_stats);
3873+ nla_nest_end(msg, act_info);
3874+ }
3875+ nla_nest_end(msg, reload_stats_attr);
3876+ return 0;
3877+
3878+action_stats_nest_cancel:
3879+ nla_nest_cancel(msg, act_stats);
3880+action_info_nest_cancel:
3881+ nla_nest_cancel(msg, act_info);
3882+nla_put_failure:
3883+ nla_nest_cancel(msg, reload_stats_attr);
3884+ return -EMSGSIZE;
3885+}
3886+
3887 static int devlink_nl_fill(struct sk_buff *msg, struct devlink *devlink,
3888 enum devlink_command cmd, u32 portid,
3889 u32 seq, int flags)
3890 {
3891+ struct nlattr *dev_stats;
3892 void *hdr;
3893
3894 hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd);
3895@@ -474,9 +700,21 @@ static int devlink_nl_fill(struct sk_buff *msg, struct devlink *devlink,
3896 if (nla_put_u8(msg, DEVLINK_ATTR_RELOAD_FAILED, devlink->reload_failed))
3897 goto nla_put_failure;
3898
3899+ dev_stats = nla_nest_start(msg, DEVLINK_ATTR_DEV_STATS);
3900+ if (!dev_stats)
3901+ goto nla_put_failure;
3902+
3903+ if (devlink_reload_stats_put(msg, devlink, false))
3904+ goto dev_stats_nest_cancel;
3905+ if (devlink_reload_stats_put(msg, devlink, true))
3906+ goto dev_stats_nest_cancel;
3907+
3908+ nla_nest_end(msg, dev_stats);
3909 genlmsg_end(msg, hdr);
3910 return 0;
3911
3912+dev_stats_nest_cancel:
3913+ nla_nest_cancel(msg, dev_stats);
3914 nla_put_failure:
3915 genlmsg_cancel(msg, hdr);
3916 return -EMSGSIZE;
3917@@ -508,26 +746,47 @@ static int devlink_nl_port_attrs_put(struct sk_buff *msg,
3918 {
3919 struct devlink_port_attrs *attrs = &devlink_port->attrs;
3920
3921- if (!attrs->set)
3922+ if (!devlink_port->attrs_set)
3923 return 0;
3924+ if (attrs->lanes) {
3925+ if (nla_put_u32(msg, DEVLINK_ATTR_PORT_LANES, attrs->lanes))
3926+ return -EMSGSIZE;
3927+ }
3928+ if (nla_put_u8(msg, DEVLINK_ATTR_PORT_SPLITTABLE, attrs->splittable))
3929+ return -EMSGSIZE;
3930 if (nla_put_u16(msg, DEVLINK_ATTR_PORT_FLAVOUR, attrs->flavour))
3931 return -EMSGSIZE;
3932 switch (devlink_port->attrs.flavour) {
3933 case DEVLINK_PORT_FLAVOUR_PCI_PF:
3934- if (nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER,
3935- attrs->pci_pf.pf))
3936+ if (nla_put_u32(msg, DEVLINK_ATTR_PORT_CONTROLLER_NUMBER,
3937+ attrs->pci_pf.controller) ||
3938+ nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER, attrs->pci_pf.pf))
3939+ return -EMSGSIZE;
3940+ if (nla_put_u8(msg, DEVLINK_ATTR_PORT_EXTERNAL, attrs->pci_pf.external))
3941 return -EMSGSIZE;
3942 break;
3943 case DEVLINK_PORT_FLAVOUR_PCI_VF:
3944- if (nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER,
3945- attrs->pci_vf.pf) ||
3946- nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_VF_NUMBER,
3947- attrs->pci_vf.vf))
3948+ if (nla_put_u32(msg, DEVLINK_ATTR_PORT_CONTROLLER_NUMBER,
3949+ attrs->pci_vf.controller) ||
3950+ nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER, attrs->pci_vf.pf) ||
3951+ nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_VF_NUMBER, attrs->pci_vf.vf))
3952+ return -EMSGSIZE;
3953+ if (nla_put_u8(msg, DEVLINK_ATTR_PORT_EXTERNAL, attrs->pci_vf.external))
3954+ return -EMSGSIZE;
3955+ break;
3956+ case DEVLINK_PORT_FLAVOUR_PCI_SF:
3957+ if (nla_put_u32(msg, DEVLINK_ATTR_PORT_CONTROLLER_NUMBER,
3958+ attrs->pci_sf.controller) ||
3959+ nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER,
3960+ attrs->pci_sf.pf) ||
3961+ nla_put_u32(msg, DEVLINK_ATTR_PORT_PCI_SF_NUMBER,
3962+ attrs->pci_sf.sf))
3963 return -EMSGSIZE;
3964 break;
3965 case DEVLINK_PORT_FLAVOUR_PHYSICAL:
3966 case DEVLINK_PORT_FLAVOUR_CPU:
3967 case DEVLINK_PORT_FLAVOUR_DSA:
3968+ case DEVLINK_PORT_FLAVOUR_VIRTUAL:
3969 if (nla_put_u32(msg, DEVLINK_ATTR_PORT_NUMBER,
3970 attrs->phys.port_number))
3971 return -EMSGSIZE;
3972@@ -546,11 +805,167 @@ static int devlink_nl_port_attrs_put(struct sk_buff *msg,
3973 return 0;
3974 }
3975
3976-static int devlink_nl_port_fill(struct sk_buff *msg, struct devlink *devlink,
3977+static int
3978+devlink_port_fn_hw_addr_fill(struct devlink *devlink, const struct devlink_ops *ops,
3979+ struct devlink_port *port, struct sk_buff *msg,
3980+ struct netlink_ext_ack *extack, bool *msg_updated)
3981+{
3982+ u8 hw_addr[MAX_ADDR_LEN];
3983+ int hw_addr_len;
3984+ int err;
3985+
3986+ if (!ops->port_function_hw_addr_get)
3987+ return 0;
3988+
3989+ err = ops->port_function_hw_addr_get(devlink, port, hw_addr, &hw_addr_len, extack);
3990+ if (err) {
3991+ if (err == -EOPNOTSUPP)
3992+ return 0;
3993+ return err;
3994+ }
3995+ err = nla_put(msg, DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR, hw_addr_len, hw_addr);
3996+ if (err)
3997+ return err;
3998+ *msg_updated = true;
3999+ return 0;
4000+}
4001+
4002+static int devlink_nl_rate_fill(struct sk_buff *msg,
4003+ struct devlink_rate *devlink_rate,
4004+ enum devlink_command cmd, u32 portid, u32 seq,
4005+ int flags, struct netlink_ext_ack *extack)
4006+{
4007+ struct devlink *devlink = devlink_rate->devlink;
4008+ void *hdr;
4009+
4010+ hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd);
4011+ if (!hdr)
4012+ return -EMSGSIZE;
4013+
4014+ if (devlink_nl_put_handle(msg, devlink))
4015+ goto nla_put_failure;
4016+
4017+ if (nla_put_u16(msg, DEVLINK_ATTR_RATE_TYPE, devlink_rate->type))
4018+ goto nla_put_failure;
4019+
4020+ if (devlink_rate_is_leaf(devlink_rate)) {
4021+ if (nla_put_u32(msg, DEVLINK_ATTR_PORT_INDEX,
4022+ devlink_rate->devlink_port->index))
4023+ goto nla_put_failure;
4024+ } else if (devlink_rate_is_node(devlink_rate)) {
4025+ if (nla_put_string(msg, DEVLINK_ATTR_RATE_NODE_NAME,
4026+ devlink_rate->name))
4027+ goto nla_put_failure;
4028+ }
4029+
4030+ if (nla_put_u64_64bit(msg, DEVLINK_ATTR_RATE_TX_SHARE,
4031+ devlink_rate->tx_share, DEVLINK_ATTR_PAD))
4032+ goto nla_put_failure;
4033+
4034+ if (nla_put_u64_64bit(msg, DEVLINK_ATTR_RATE_TX_MAX,
4035+ devlink_rate->tx_max, DEVLINK_ATTR_PAD))
4036+ goto nla_put_failure;
4037+
4038+ if (devlink_rate->parent)
4039+ if (nla_put_string(msg, DEVLINK_ATTR_RATE_PARENT_NODE_NAME,
4040+ devlink_rate->parent->name))
4041+ goto nla_put_failure;
4042+
4043+ genlmsg_end(msg, hdr);
4044+ return 0;
4045+
4046+nla_put_failure:
4047+ genlmsg_cancel(msg, hdr);
4048+ return -EMSGSIZE;
4049+}
4050+
4051+static bool
4052+devlink_port_fn_state_valid(enum devlink_port_fn_state state)
4053+{
4054+ return state == DEVLINK_PORT_FN_STATE_INACTIVE ||
4055+ state == DEVLINK_PORT_FN_STATE_ACTIVE;
4056+}
4057+
4058+static bool
4059+devlink_port_fn_opstate_valid(enum devlink_port_fn_opstate opstate)
4060+{
4061+ return opstate == DEVLINK_PORT_FN_OPSTATE_DETACHED ||
4062+ opstate == DEVLINK_PORT_FN_OPSTATE_ATTACHED;
4063+}
4064+
4065+static int
4066+devlink_port_fn_state_fill(struct devlink *devlink,
4067+ const struct devlink_ops *ops,
4068+ struct devlink_port *port, struct sk_buff *msg,
4069+ struct netlink_ext_ack *extack,
4070+ bool *msg_updated)
4071+{
4072+ enum devlink_port_fn_opstate opstate;
4073+ enum devlink_port_fn_state state;
4074+ int err;
4075+
4076+ if (!ops->port_fn_state_get)
4077+ return 0;
4078+
4079+ err = ops->port_fn_state_get(devlink, port, &state, &opstate, extack);
4080+ if (err) {
4081+ if (err == -EOPNOTSUPP)
4082+ return 0;
4083+ return err;
4084+ }
4085+ if (!devlink_port_fn_state_valid(state)) {
4086+ WARN_ON_ONCE(1);
4087+ NL_SET_ERR_MSG_MOD(extack, "Invalid state read from driver");
4088+ return -EINVAL;
4089+ }
4090+ if (!devlink_port_fn_opstate_valid(opstate)) {
4091+ WARN_ON_ONCE(1);
4092+ NL_SET_ERR_MSG_MOD(extack,
4093+ "Invalid operational state read from driver");
4094+ return -EINVAL;
4095+ }
4096+ if (nla_put_u8(msg, DEVLINK_PORT_FN_ATTR_STATE, state) ||
4097+ nla_put_u8(msg, DEVLINK_PORT_FN_ATTR_OPSTATE, opstate))
4098+ return -EMSGSIZE;
4099+ *msg_updated = true;
4100+ return 0;
4101+}
4102+
4103+static int
4104+devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct devlink_port *port,
4105+ struct netlink_ext_ack *extack)
4106+{
4107+ struct devlink *devlink = port->devlink;
4108+ const struct devlink_ops *ops;
4109+ struct nlattr *function_attr;
4110+ bool msg_updated = false;
4111+ int err;
4112+
4113+ function_attr = nla_nest_start_noflag(msg, DEVLINK_ATTR_PORT_FUNCTION);
4114+ if (!function_attr)
4115+ return -EMSGSIZE;
4116+
4117+ ops = devlink->ops;
4118+ err = devlink_port_fn_hw_addr_fill(devlink, ops, port, msg,
4119+ extack, &msg_updated);
4120+ if (err)
4121+ goto out;
4122+ err = devlink_port_fn_state_fill(devlink, ops, port, msg, extack,
4123+ &msg_updated);
4124+out:
4125+ if (err || !msg_updated)
4126+ nla_nest_cancel(msg, function_attr);
4127+ else
4128+ nla_nest_end(msg, function_attr);
4129+ return err;
4130+}
4131+
4132+static int devlink_nl_port_fill(struct sk_buff *msg,
4133 struct devlink_port *devlink_port,
4134- enum devlink_command cmd, u32 portid,
4135- u32 seq, int flags)
4136+ enum devlink_command cmd, u32 portid, u32 seq,
4137+ int flags, struct netlink_ext_ack *extack)
4138 {
4139+ struct devlink *devlink = devlink_port->devlink;
4140 void *hdr;
4141
4142 hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd);
4143@@ -593,6 +1008,8 @@ static int devlink_nl_port_fill(struct sk_buff *msg, struct devlink *devlink,
4144 rtnl_unlock();
4145 if (devlink_nl_port_attrs_put(msg, devlink_port))
4146 goto nla_put_failure;
4147+ if (devlink_nl_port_function_attrs_put(msg, devlink_port, extack))
4148+ goto nla_put_failure;
4149
4150 genlmsg_end(msg, hdr);
4151 return 0;
4152@@ -608,84 +1025,95 @@ nla_put_failure:
4153 static void devlink_port_notify(struct devlink_port *devlink_port,
4154 enum devlink_command cmd)
4155 {
4156- struct devlink *devlink = devlink_port->devlink;
4157 struct sk_buff *msg;
4158 int err;
4159
4160- if (!devlink_port->registered)
4161- return;
4162-
4163 WARN_ON(cmd != DEVLINK_CMD_PORT_NEW && cmd != DEVLINK_CMD_PORT_DEL);
4164
4165 msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
4166 if (!msg)
4167 return;
4168
4169- err = devlink_nl_port_fill(msg, devlink, devlink_port, cmd, 0, 0, 0);
4170+ err = devlink_nl_port_fill(msg, devlink_port, cmd, 0, 0, 0, NULL);
4171 if (err) {
4172 nlmsg_free(msg);
4173 return;
4174 }
4175
4176- genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink),
4177- msg, 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL);
4178+ genlmsg_multicast_netns(&devlink_nl_family,
4179+ devlink_net(devlink_port->devlink), msg, 0,
4180+ DEVLINK_MCGRP_CONFIG, GFP_KERNEL);
4181 }
4182
4183-static int devlink_nl_cmd_get_doit(struct sk_buff *skb, struct genl_info *info)
4184+static void devlink_rate_notify(struct devlink_rate *devlink_rate,
4185+ enum devlink_command cmd)
4186 {
4187- struct devlink *devlink = info->user_ptr[0];
4188 struct sk_buff *msg;
4189 int err;
4190
4191+ WARN_ON(cmd != DEVLINK_CMD_RATE_NEW && cmd != DEVLINK_CMD_RATE_DEL);
4192+
4193 msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
4194 if (!msg)
4195- return -ENOMEM;
4196+ return;
4197
4198- err = devlink_nl_fill(msg, devlink, DEVLINK_CMD_NEW,
4199- info->snd_portid, info->snd_seq, 0);
4200+ err = devlink_nl_rate_fill(msg, devlink_rate, cmd, 0, 0, 0, NULL);
4201 if (err) {
4202 nlmsg_free(msg);
4203- return err;
4204+ return;
4205 }
4206
4207- return genlmsg_reply(msg, info);
4208+ genlmsg_multicast_netns(&devlink_nl_family,
4209+ devlink_net(devlink_rate->devlink), msg, 0,
4210+ DEVLINK_MCGRP_CONFIG, GFP_KERNEL);
4211 }
4212
4213-static int devlink_nl_cmd_get_dumpit(struct sk_buff *msg,
4214- struct netlink_callback *cb)
4215+static int devlink_nl_cmd_rate_get_dumpit(struct sk_buff *msg,
4216+ struct netlink_callback *cb)
4217 {
4218+ struct devlink_rate *devlink_rate;
4219 struct devlink *devlink;
4220 int start = cb->args[0];
4221 int idx = 0;
4222- int err;
4223+ int err = 0;
4224
4225 mutex_lock(&devlink_mutex);
4226 list_for_each_entry(devlink, &devlink_list, list) {
4227 if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
4228 continue;
4229- if (idx < start) {
4230+ mutex_lock(&devlink->lock);
4231+ list_for_each_entry(devlink_rate, &devlink->rate_list, list) {
4232+ enum devlink_command cmd = DEVLINK_CMD_RATE_NEW;
4233+ u32 id = NETLINK_CB(cb->skb).portid;
4234+
4235+ if (idx < start) {
4236+ idx++;
4237+ continue;
4238+ }
4239+ err = devlink_nl_rate_fill(msg, devlink_rate, cmd, id,
4240+ cb->nlh->nlmsg_seq,
4241+ NLM_F_MULTI, NULL);
4242+ if (err) {
4243+ mutex_unlock(&devlink->lock);
4244+ goto out;
4245+ }
4246 idx++;
4247- continue;
4248 }
4249- err = devlink_nl_fill(msg, devlink, DEVLINK_CMD_NEW,
4250- NETLINK_CB(cb->skb).portid,
4251- cb->nlh->nlmsg_seq, NLM_F_MULTI);
4252- if (err)
4253- goto out;
4254- idx++;
4255+ mutex_unlock(&devlink->lock);
4256 }
4257 out:
4258 mutex_unlock(&devlink_mutex);
4259+ if (err != -EMSGSIZE)
4260+ return err;
4261
4262 cb->args[0] = idx;
4263 return msg->len;
4264 }
4265
4266-static int devlink_nl_cmd_port_get_doit(struct sk_buff *skb,
4267+static int devlink_nl_cmd_rate_get_doit(struct sk_buff *skb,
4268 struct genl_info *info)
4269 {
4270- struct devlink_port *devlink_port = info->user_ptr[0];
4271- struct devlink *devlink = devlink_port->devlink;
4272+ struct devlink_rate *devlink_rate = info->user_ptr[1];
4273 struct sk_buff *msg;
4274 int err;
4275
4276@@ -693,9 +1121,9 @@ static int devlink_nl_cmd_port_get_doit(struct sk_buff *skb,
4277 if (!msg)
4278 return -ENOMEM;
4279
4280- err = devlink_nl_port_fill(msg, devlink, devlink_port,
4281- DEVLINK_CMD_PORT_NEW,
4282- info->snd_portid, info->snd_seq, 0);
4283+ err = devlink_nl_rate_fill(msg, devlink_rate, DEVLINK_CMD_RATE_NEW,
4284+ info->snd_portid, info->snd_seq, 0,
4285+ info->extack);
4286 if (err) {
4287 nlmsg_free(msg);
4288 return err;
4289@@ -704,8 +1132,92 @@ static int devlink_nl_cmd_port_get_doit(struct sk_buff *skb,
4290 return genlmsg_reply(msg, info);
4291 }
4292
4293-static int devlink_nl_cmd_port_get_dumpit(struct sk_buff *msg,
4294- struct netlink_callback *cb)
4295+static bool
4296+devlink_rate_is_parent_node(struct devlink_rate *devlink_rate,
4297+ struct devlink_rate *parent)
4298+{
4299+ while (parent) {
4300+ if (parent == devlink_rate)
4301+ return true;
4302+ parent = parent->parent;
4303+ }
4304+ return false;
4305+}
4306+
4307+static int devlink_nl_cmd_get_doit(struct sk_buff *skb, struct genl_info *info)
4308+{
4309+ struct devlink *devlink = info->user_ptr[0];
4310+ struct sk_buff *msg;
4311+ int err;
4312+
4313+ msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
4314+ if (!msg)
4315+ return -ENOMEM;
4316+
4317+ err = devlink_nl_fill(msg, devlink, DEVLINK_CMD_NEW,
4318+ info->snd_portid, info->snd_seq, 0);
4319+ if (err) {
4320+ nlmsg_free(msg);
4321+ return err;
4322+ }
4323+
4324+ return genlmsg_reply(msg, info);
4325+}
4326+
4327+static int devlink_nl_cmd_get_dumpit(struct sk_buff *msg,
4328+ struct netlink_callback *cb)
4329+{
4330+ struct devlink *devlink;
4331+ int start = cb->args[0];
4332+ int idx = 0;
4333+ int err;
4334+
4335+ mutex_lock(&devlink_mutex);
4336+ list_for_each_entry(devlink, &devlink_list, list) {
4337+ if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
4338+ continue;
4339+ if (idx < start) {
4340+ idx++;
4341+ continue;
4342+ }
4343+ err = devlink_nl_fill(msg, devlink, DEVLINK_CMD_NEW,
4344+ NETLINK_CB(cb->skb).portid,
4345+ cb->nlh->nlmsg_seq, NLM_F_MULTI);
4346+ if (err)
4347+ goto out;
4348+ idx++;
4349+ }
4350+out:
4351+ mutex_unlock(&devlink_mutex);
4352+
4353+ cb->args[0] = idx;
4354+ return msg->len;
4355+}
4356+
4357+static int devlink_nl_cmd_port_get_doit(struct sk_buff *skb,
4358+ struct genl_info *info)
4359+{
4360+ struct devlink_port *devlink_port = info->user_ptr[1];
4361+ struct sk_buff *msg;
4362+ int err;
4363+
4364+ msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
4365+ if (!msg)
4366+ return -ENOMEM;
4367+
4368+ err = devlink_nl_port_fill(msg, devlink_port, DEVLINK_CMD_PORT_NEW,
4369+ info->snd_portid, info->snd_seq, 0,
4370+ info->extack);
4371+ if (err) {
4372+ nlmsg_free(msg);
4373+ return err;
4374+ }
4375+
4376+ return genlmsg_reply(msg, info);
4377+}
4378+
4379+static int devlink_nl_cmd_port_get_dumpit(struct sk_buff *msg,
4380+ struct netlink_callback *cb)
4381 {
4382 struct devlink *devlink;
4383 struct devlink_port *devlink_port;
4384@@ -723,11 +1235,11 @@ static int devlink_nl_cmd_port_get_dumpit(struct sk_buff *msg,
4385 idx++;
4386 continue;
4387 }
4388- err = devlink_nl_port_fill(msg, devlink, devlink_port,
4389+ err = devlink_nl_port_fill(msg, devlink_port,
4390 DEVLINK_CMD_NEW,
4391 NETLINK_CB(cb->skb).portid,
4392 cb->nlh->nlmsg_seq,
4393- NLM_F_MULTI);
4394+ NLM_F_MULTI, cb->extack);
4395 if (err) {
4396 mutex_unlock(&devlink->lock);
4397 goto out;
4398@@ -765,10 +1277,95 @@ static int devlink_port_type_set(struct devlink *devlink,
4399 return -EOPNOTSUPP;
4400 }
4401
4402+static int
4403+devlink_port_function_hw_addr_set(struct devlink *devlink, struct devlink_port *port,
4404+ const struct nlattr *attr, struct netlink_ext_ack *extack)
4405+{
4406+ const struct devlink_ops *ops;
4407+ const u8 *hw_addr;
4408+ int hw_addr_len;
4409+
4410+ hw_addr = nla_data(attr);
4411+ hw_addr_len = nla_len(attr);
4412+ if (hw_addr_len > MAX_ADDR_LEN) {
4413+ NL_SET_ERR_MSG_MOD(extack, "Port function hardware address too long");
4414+ return -EINVAL;
4415+ }
4416+ if (port->type == DEVLINK_PORT_TYPE_ETH) {
4417+ if (hw_addr_len != ETH_ALEN) {
4418+ NL_SET_ERR_MSG_MOD(extack, "Address must be 6 bytes for Ethernet device");
4419+ return -EINVAL;
4420+ }
4421+ if (!is_unicast_ether_addr(hw_addr)) {
4422+ NL_SET_ERR_MSG_MOD(extack, "Non-unicast hardware address unsupported");
4423+ return -EINVAL;
4424+ }
4425+ }
4426+
4427+ ops = devlink->ops;
4428+ if (!ops->port_function_hw_addr_set) {
4429+ NL_SET_ERR_MSG_MOD(extack, "Port doesn't support function attributes");
4430+ return -EOPNOTSUPP;
4431+ }
4432+
4433+ return ops->port_function_hw_addr_set(devlink, port, hw_addr, hw_addr_len, extack);
4434+}
4435+
4436+static int devlink_port_fn_state_set(struct devlink *devlink,
4437+ struct devlink_port *port,
4438+ const struct nlattr *attr,
4439+ struct netlink_ext_ack *extack)
4440+{
4441+ enum devlink_port_fn_state state;
4442+ const struct devlink_ops *ops;
4443+
4444+ state = nla_get_u8(attr);
4445+ ops = devlink->ops;
4446+ if (!ops->port_fn_state_set) {
4447+ NL_SET_ERR_MSG_MOD(extack,
4448+ "Function does not support state setting");
4449+ return -EOPNOTSUPP;
4450+ }
4451+ return ops->port_fn_state_set(devlink, port, state, extack);
4452+}
4453+
4454+static int
4455+devlink_port_function_set(struct devlink *devlink, struct devlink_port *port,
4456+ const struct nlattr *attr, struct netlink_ext_ack *extack)
4457+{
4458+ struct nlattr *tb[DEVLINK_PORT_FUNCTION_ATTR_MAX + 1];
4459+ int err;
4460+
4461+ err = nla_parse_nested(tb, DEVLINK_PORT_FUNCTION_ATTR_MAX, attr,
4462+ devlink_function_nl_policy, extack);
4463+ if (err < 0) {
4464+ NL_SET_ERR_MSG_MOD(extack, "Fail to parse port function attributes");
4465+ return err;
4466+ }
4467+
4468+ attr = tb[DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR];
4469+ if (attr) {
4470+ err = devlink_port_function_hw_addr_set(devlink, port, attr, extack);
4471+ if (err)
4472+ return err;
4473+ }
4474+ /* Keep this as the last function attribute set, so that when
4475+ * multiple port function attributes are set along with state,
4476+ * Those can be applied first before activating the state.
4477+ */
4478+ attr = tb[DEVLINK_PORT_FN_ATTR_STATE];
4479+ if (attr)
4480+ err = devlink_port_fn_state_set(devlink, port, attr, extack);
4481+
4482+ if (!err)
4483+ devlink_port_notify(port, DEVLINK_CMD_PORT_NEW);
4484+ return err;
4485+}
4486+
4487 static int devlink_nl_cmd_port_set_doit(struct sk_buff *skb,
4488 struct genl_info *info)
4489 {
4490- struct devlink_port *devlink_port = info->user_ptr[0];
4491+ struct devlink_port *devlink_port = info->user_ptr[1];
4492 struct devlink *devlink = devlink_port->devlink;
4493 int err;
4494
4495@@ -780,6 +1377,16 @@ static int devlink_nl_cmd_port_set_doit(struct sk_buff *skb,
4496 if (err)
4497 return err;
4498 }
4499+
4500+ if (info->attrs[DEVLINK_ATTR_PORT_FUNCTION]) {
4501+ struct nlattr *attr = info->attrs[DEVLINK_ATTR_PORT_FUNCTION];
4502+ struct netlink_ext_ack *extack = info->extack;
4503+
4504+ err = devlink_port_function_set(devlink, devlink_port, attr, extack);
4505+ if (err)
4506+ return err;
4507+ }
4508+
4509 return 0;
4510 }
4511
4512@@ -831,6 +1438,359 @@ static int devlink_nl_cmd_port_unsplit_doit(struct sk_buff *skb,
4513 return devlink_port_unsplit(devlink, port_index, info->extack);
4514 }
4515
4516+static int devlink_port_new_notifiy(struct devlink *devlink,
4517+ unsigned int port_index,
4518+ struct genl_info *info)
4519+{
4520+ struct devlink_port *devlink_port;
4521+ struct sk_buff *msg;
4522+ int err;
4523+
4524+ msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
4525+ if (!msg)
4526+ return -ENOMEM;
4527+
4528+ mutex_lock(&devlink->lock);
4529+ devlink_port = devlink_port_get_by_index(devlink, port_index);
4530+ if (!devlink_port) {
4531+ err = -ENODEV;
4532+ goto out;
4533+ }
4534+
4535+ err = devlink_nl_port_fill(msg, devlink_port, DEVLINK_CMD_NEW,
4536+ info->snd_portid, info->snd_seq, 0, NULL);
4537+ if (err)
4538+ goto out;
4539+
4540+ err = genlmsg_reply(msg, info);
4541+ mutex_unlock(&devlink->lock);
4542+ return err;
4543+
4544+out:
4545+ mutex_unlock(&devlink->lock);
4546+ nlmsg_free(msg);
4547+ return err;
4548+}
4549+
4550+static int devlink_nl_cmd_port_new_doit(struct sk_buff *skb,
4551+ struct genl_info *info)
4552+{
4553+ struct netlink_ext_ack *extack = info->extack;
4554+ struct devlink_port_new_attrs new_attrs = {};
4555+ struct devlink *devlink = info->user_ptr[0];
4556+ unsigned int new_port_index;
4557+ int err;
4558+
4559+ if (!devlink->ops->port_new || !devlink->ops->port_del)
4560+ return -EOPNOTSUPP;
4561+
4562+ if (!info->attrs[DEVLINK_ATTR_PORT_FLAVOUR] ||
4563+ !info->attrs[DEVLINK_ATTR_PORT_PCI_PF_NUMBER]) {
4564+ NL_SET_ERR_MSG_MOD(extack, "Port flavour or PCI PF are not specified");
4565+ return -EINVAL;
4566+ }
4567+ new_attrs.flavour = nla_get_u16(info->attrs[DEVLINK_ATTR_PORT_FLAVOUR]);
4568+ new_attrs.pfnum =
4569+ nla_get_u16(info->attrs[DEVLINK_ATTR_PORT_PCI_PF_NUMBER]);
4570+
4571+ if (info->attrs[DEVLINK_ATTR_PORT_INDEX]) {
4572+ /* Port index of the new port being created by driver. */
4573+ new_attrs.port_index =
4574+ nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_INDEX]);
4575+ new_attrs.port_index_valid = true;
4576+ }
4577+ if (info->attrs[DEVLINK_ATTR_PORT_CONTROLLER_NUMBER]) {
4578+ new_attrs.controller =
4579+ nla_get_u16(info->attrs[DEVLINK_ATTR_PORT_CONTROLLER_NUMBER]);
4580+ new_attrs.controller_valid = true;
4581+ }
4582+ if (new_attrs.flavour == DEVLINK_PORT_FLAVOUR_PCI_SF &&
4583+ info->attrs[DEVLINK_ATTR_PORT_PCI_SF_NUMBER]) {
4584+ new_attrs.sfnum = nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_PCI_SF_NUMBER]);
4585+ new_attrs.sfnum_valid = true;
4586+ }
4587+
4588+ err = devlink->ops->port_new(devlink, &new_attrs, extack,
4589+ &new_port_index);
4590+ if (err)
4591+ return err;
4592+
4593+ err = devlink_port_new_notifiy(devlink, new_port_index, info);
4594+ if (err && err != -ENODEV) {
4595+ /* Fail to send the response; destroy newly created port. */
4596+ devlink->ops->port_del(devlink, new_port_index, extack);
4597+ }
4598+ return err;
4599+}
4600+
4601+static int devlink_nl_cmd_port_del_doit(struct sk_buff *skb,
4602+ struct genl_info *info)
4603+{
4604+ struct netlink_ext_ack *extack = info->extack;
4605+ struct devlink *devlink = info->user_ptr[0];
4606+ unsigned int port_index;
4607+
4608+ if (!devlink->ops->port_del)
4609+ return -EOPNOTSUPP;
4610+
4611+ if (!info->attrs[DEVLINK_ATTR_PORT_INDEX]) {
4612+ NL_SET_ERR_MSG_MOD(extack, "Port index is not specified");
4613+ return -EINVAL;
4614+ }
4615+ port_index = nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_INDEX]);
4616+
4617+ return devlink->ops->port_del(devlink, port_index, extack);
4618+}
4619+
4620+static int
4621+devlink_nl_rate_parent_node_set(struct devlink_rate *devlink_rate,
4622+ struct genl_info *info,
4623+ struct nlattr *nla_parent)
4624+{
4625+ struct devlink *devlink = devlink_rate->devlink;
4626+ const char *parent_name = nla_data(nla_parent);
4627+ const struct devlink_ops *ops = devlink->ops;
4628+ size_t len = strlen(parent_name);
4629+ struct devlink_rate *parent;
4630+ int err = -EOPNOTSUPP;
4631+
4632+ parent = devlink_rate->parent;
4633+ if (parent && len) {
4634+ NL_SET_ERR_MSG_MOD(info->extack, "Rate object already has parent.");
4635+ return -EBUSY;
4636+ } else if (parent && !len) {
4637+ if (devlink_rate_is_leaf(devlink_rate))
4638+ err = ops->rate_leaf_parent_set(devlink_rate, NULL,
4639+ devlink_rate->priv, NULL,
4640+ info->extack);
4641+ else if (devlink_rate_is_node(devlink_rate))
4642+ err = ops->rate_node_parent_set(devlink_rate, NULL,
4643+ devlink_rate->priv, NULL,
4644+ info->extack);
4645+ if (err)
4646+ return err;
4647+
4648+ refcount_dec(&parent->refcnt);
4649+ devlink_rate->parent = NULL;
4650+ } else if (!parent && len) {
4651+ parent = devlink_rate_node_get_by_name(devlink, parent_name);
4652+ if (IS_ERR(parent))
4653+ return -ENODEV;
4654+
4655+ if (parent == devlink_rate) {
4656+ NL_SET_ERR_MSG_MOD(info->extack, "Parent to self is not allowed");
4657+ return -EINVAL;
4658+ }
4659+
4660+ if (devlink_rate_is_node(devlink_rate) &&
4661+ devlink_rate_is_parent_node(devlink_rate, parent->parent)) {
4662+ NL_SET_ERR_MSG_MOD(info->extack, "Node is already a parent of parent node.");
4663+ return -EEXIST;
4664+ }
4665+
4666+ if (devlink_rate_is_leaf(devlink_rate))
4667+ err = ops->rate_leaf_parent_set(devlink_rate, parent,
4668+ devlink_rate->priv, parent->priv,
4669+ info->extack);
4670+ else if (devlink_rate_is_node(devlink_rate))
4671+ err = ops->rate_node_parent_set(devlink_rate, parent,
4672+ devlink_rate->priv, parent->priv,
4673+ info->extack);
4674+ if (err)
4675+ return err;
4676+
4677+ refcount_inc(&parent->refcnt);
4678+ devlink_rate->parent = parent;
4679+ }
4680+
4681+ return 0;
4682+}
4683+
4684+static int devlink_nl_rate_set(struct devlink_rate *devlink_rate,
4685+ const struct devlink_ops *ops,
4686+ struct genl_info *info)
4687+{
4688+ struct nlattr *nla_parent, **attrs = info->attrs;
4689+ int err = -EOPNOTSUPP;
4690+ u64 rate;
4691+
4692+ if (attrs[DEVLINK_ATTR_RATE_TX_SHARE]) {
4693+ rate = nla_get_u64(attrs[DEVLINK_ATTR_RATE_TX_SHARE]);
4694+ if (devlink_rate_is_leaf(devlink_rate))
4695+ err = ops->rate_leaf_tx_share_set(devlink_rate, devlink_rate->priv,
4696+ rate, info->extack);
4697+ else if (devlink_rate_is_node(devlink_rate))
4698+ err = ops->rate_node_tx_share_set(devlink_rate, devlink_rate->priv,
4699+ rate, info->extack);
4700+ if (err)
4701+ return err;
4702+ devlink_rate->tx_share = rate;
4703+ }
4704+
4705+ if (attrs[DEVLINK_ATTR_RATE_TX_MAX]) {
4706+ rate = nla_get_u64(attrs[DEVLINK_ATTR_RATE_TX_MAX]);
4707+ if (devlink_rate_is_leaf(devlink_rate))
4708+ err = ops->rate_leaf_tx_max_set(devlink_rate, devlink_rate->priv,
4709+ rate, info->extack);
4710+ else if (devlink_rate_is_node(devlink_rate))
4711+ err = ops->rate_node_tx_max_set(devlink_rate, devlink_rate->priv,
4712+ rate, info->extack);
4713+ if (err)
4714+ return err;
4715+ devlink_rate->tx_max = rate;
4716+ }
4717+
4718+ nla_parent = attrs[DEVLINK_ATTR_RATE_PARENT_NODE_NAME];
4719+ if (nla_parent) {
4720+ err = devlink_nl_rate_parent_node_set(devlink_rate, info,
4721+ nla_parent);
4722+ if (err)
4723+ return err;
4724+ }
4725+
4726+ return 0;
4727+}
4728+
4729+static bool devlink_rate_set_ops_supported(const struct devlink_ops *ops,
4730+ struct genl_info *info,
4731+ enum devlink_rate_type type)
4732+{
4733+ struct nlattr **attrs = info->attrs;
4734+
4735+ if (type == DEVLINK_RATE_TYPE_LEAF) {
4736+ if (attrs[DEVLINK_ATTR_RATE_TX_SHARE] && !ops->rate_leaf_tx_share_set) {
4737+ NL_SET_ERR_MSG_MOD(info->extack, "TX share set isn't supported for the leafs");
4738+ return false;
4739+ }
4740+ if (attrs[DEVLINK_ATTR_RATE_TX_MAX] && !ops->rate_leaf_tx_max_set) {
4741+ NL_SET_ERR_MSG_MOD(info->extack, "TX max set isn't supported for the leafs");
4742+ return false;
4743+ }
4744+ if (attrs[DEVLINK_ATTR_RATE_PARENT_NODE_NAME] &&
4745+ !ops->rate_leaf_parent_set) {
4746+ NL_SET_ERR_MSG_MOD(info->extack, "Parent set isn't supported for the leafs");
4747+ return false;
4748+ }
4749+ } else if (type == DEVLINK_RATE_TYPE_NODE) {
4750+ if (attrs[DEVLINK_ATTR_RATE_TX_SHARE] && !ops->rate_node_tx_share_set) {
4751+ NL_SET_ERR_MSG_MOD(info->extack, "TX share set isn't supported for the nodes");
4752+ return false;
4753+ }
4754+ if (attrs[DEVLINK_ATTR_RATE_TX_MAX] && !ops->rate_node_tx_max_set) {
4755+ NL_SET_ERR_MSG_MOD(info->extack, "TX max set isn't supported for the nodes");
4756+ return false;
4757+ }
4758+ if (attrs[DEVLINK_ATTR_RATE_PARENT_NODE_NAME] &&
4759+ !ops->rate_node_parent_set) {
4760+ NL_SET_ERR_MSG_MOD(info->extack, "Parent set isn't supported for the nodes");
4761+ return false;
4762+ }
4763+ } else {
4764+ WARN_ON("Unknown type of rate object");
4765+ return false;
4766+ }
4767+
4768+ return true;
4769+}
4770+
4771+static int devlink_nl_cmd_rate_set_doit(struct sk_buff *skb,
4772+ struct genl_info *info)
4773+{
4774+ struct devlink_rate *devlink_rate = info->user_ptr[1];
4775+ struct devlink *devlink = devlink_rate->devlink;
4776+ const struct devlink_ops *ops = devlink->ops;
4777+ int err;
4778+
4779+ if (!ops || !devlink_rate_set_ops_supported(ops, info, devlink_rate->type))
4780+ return -EOPNOTSUPP;
4781+
4782+ err = devlink_nl_rate_set(devlink_rate, ops, info);
4783+
4784+ if (!err)
4785+ devlink_rate_notify(devlink_rate, DEVLINK_CMD_RATE_NEW);
4786+ return err;
4787+}
4788+
4789+static int devlink_nl_cmd_rate_new_doit(struct sk_buff *skb,
4790+ struct genl_info *info)
4791+{
4792+ struct devlink *devlink = info->user_ptr[0];
4793+ struct devlink_rate *rate_node;
4794+ const struct devlink_ops *ops;
4795+ int err;
4796+
4797+ ops = devlink->ops;
4798+ if (!ops || !ops->rate_node_new || !ops->rate_node_del) {
4799+ NL_SET_ERR_MSG_MOD(info->extack, "Rate nodes aren't supported");
4800+ return -EOPNOTSUPP;
4801+ }
4802+
4803+ if (!devlink_rate_set_ops_supported(ops, info, DEVLINK_RATE_TYPE_NODE))
4804+ return -EOPNOTSUPP;
4805+
4806+ rate_node = devlink_rate_node_get_from_attrs(devlink, info->attrs);
4807+ if (!IS_ERR(rate_node))
4808+ return -EEXIST;
4809+ else if (rate_node == ERR_PTR(-EINVAL))
4810+ return -EINVAL;
4811+
4812+ rate_node = kzalloc(sizeof(*rate_node), GFP_KERNEL);
4813+ if (!rate_node)
4814+ return -ENOMEM;
4815+
4816+ rate_node->devlink = devlink;
4817+ rate_node->type = DEVLINK_RATE_TYPE_NODE;
4818+ rate_node->name = nla_strdup(info->attrs[DEVLINK_ATTR_RATE_NODE_NAME], GFP_KERNEL);
4819+ if (!rate_node->name) {
4820+ err = -ENOMEM;
4821+ goto err_strdup;
4822+ }
4823+
4824+ err = ops->rate_node_new(rate_node, &rate_node->priv, info->extack);
4825+ if (err)
4826+ goto err_node_new;
4827+
4828+ err = devlink_nl_rate_set(rate_node, ops, info);
4829+ if (err)
4830+ goto err_rate_set;
4831+
4832+ refcount_set(&rate_node->refcnt, 1);
4833+ list_add(&rate_node->list, &devlink->rate_list);
4834+ devlink_rate_notify(rate_node, DEVLINK_CMD_RATE_NEW);
4835+ return 0;
4836+
4837+err_rate_set:
4838+ ops->rate_node_del(rate_node, rate_node->priv, info->extack);
4839+err_node_new:
4840+ kfree(rate_node->name);
4841+err_strdup:
4842+ kfree(rate_node);
4843+ return err;
4844+}
4845+
4846+static int devlink_nl_cmd_rate_del_doit(struct sk_buff *skb,
4847+ struct genl_info *info)
4848+{
4849+ struct devlink_rate *rate_node = info->user_ptr[1];
4850+ struct devlink *devlink = rate_node->devlink;
4851+ const struct devlink_ops *ops = devlink->ops;
4852+ int err;
4853+
4854+ if (refcount_read(&rate_node->refcnt) > 1) {
4855+ NL_SET_ERR_MSG_MOD(info->extack, "Node has children. Cannot delete node.");
4856+ return -EBUSY;
4857+ }
4858+
4859+ devlink_rate_notify(rate_node, DEVLINK_CMD_RATE_DEL);
4860+ err = ops->rate_node_del(rate_node, rate_node->priv, info->extack);
4861+ if (rate_node->parent)
4862+ refcount_dec(&rate_node->parent->refcnt);
4863+ list_del(&rate_node->list);
4864+ kfree(rate_node->name);
4865+ kfree(rate_node);
4866+ return err;
4867+}
4868+
4869 static int devlink_nl_sb_fill(struct sk_buff *msg, struct devlink *devlink,
4870 struct devlink_sb *devlink_sb,
4871 enum devlink_command cmd, u32 portid,
4872@@ -873,10 +1833,14 @@ static int devlink_nl_cmd_sb_get_doit(struct sk_buff *skb,
4873 struct genl_info *info)
4874 {
4875 struct devlink *devlink = info->user_ptr[0];
4876- struct devlink_sb *devlink_sb = info->user_ptr[1];
4877+ struct devlink_sb *devlink_sb;
4878 struct sk_buff *msg;
4879 int err;
4880
4881+ devlink_sb = devlink_sb_get_from_info(devlink, info);
4882+ if (IS_ERR(devlink_sb))
4883+ return PTR_ERR(devlink_sb);
4884+
4885 msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
4886 if (!msg)
4887 return -ENOMEM;
4888@@ -978,11 +1942,15 @@ static int devlink_nl_cmd_sb_pool_get_doit(struct sk_buff *skb,
4889 struct genl_info *info)
4890 {
4891 struct devlink *devlink = info->user_ptr[0];
4892- struct devlink_sb *devlink_sb = info->user_ptr[1];
4893+ struct devlink_sb *devlink_sb;
4894 struct sk_buff *msg;
4895 u16 pool_index;
4896 int err;
4897
4898+ devlink_sb = devlink_sb_get_from_info(devlink, info);
4899+ if (IS_ERR(devlink_sb))
4900+ return PTR_ERR(devlink_sb);
4901+
4902 err = devlink_sb_pool_index_get_from_info(devlink_sb, info,
4903 &pool_index);
4904 if (err)
4905@@ -1052,7 +2020,9 @@ static int devlink_nl_cmd_sb_pool_get_dumpit(struct sk_buff *msg,
4906 devlink_sb,
4907 NETLINK_CB(cb->skb).portid,
4908 cb->nlh->nlmsg_seq);
4909- if (err && err != -EOPNOTSUPP) {
4910+ if (err == -EOPNOTSUPP) {
4911+ err = 0;
4912+ } else if (err) {
4913 mutex_unlock(&devlink->lock);
4914 goto out;
4915 }
4916@@ -1084,12 +2054,16 @@ static int devlink_nl_cmd_sb_pool_set_doit(struct sk_buff *skb,
4917 struct genl_info *info)
4918 {
4919 struct devlink *devlink = info->user_ptr[0];
4920- struct devlink_sb *devlink_sb = info->user_ptr[1];
4921 enum devlink_sb_threshold_type threshold_type;
4922+ struct devlink_sb *devlink_sb;
4923 u16 pool_index;
4924 u32 size;
4925 int err;
4926
4927+ devlink_sb = devlink_sb_get_from_info(devlink, info);
4928+ if (IS_ERR(devlink_sb))
4929+ return PTR_ERR(devlink_sb);
4930+
4931 err = devlink_sb_pool_index_get_from_info(devlink_sb, info,
4932 &pool_index);
4933 if (err)
4934@@ -1170,13 +2144,17 @@ sb_occ_get_failure:
4935 static int devlink_nl_cmd_sb_port_pool_get_doit(struct sk_buff *skb,
4936 struct genl_info *info)
4937 {
4938- struct devlink_port *devlink_port = info->user_ptr[0];
4939+ struct devlink_port *devlink_port = info->user_ptr[1];
4940 struct devlink *devlink = devlink_port->devlink;
4941- struct devlink_sb *devlink_sb = info->user_ptr[1];
4942+ struct devlink_sb *devlink_sb;
4943 struct sk_buff *msg;
4944 u16 pool_index;
4945 int err;
4946
4947+ devlink_sb = devlink_sb_get_from_info(devlink, info);
4948+ if (IS_ERR(devlink_sb))
4949+ return PTR_ERR(devlink_sb);
4950+
4951 err = devlink_sb_pool_index_get_from_info(devlink_sb, info,
4952 &pool_index);
4953 if (err)
4954@@ -1252,7 +2230,9 @@ static int devlink_nl_cmd_sb_port_pool_get_dumpit(struct sk_buff *msg,
4955 devlink, devlink_sb,
4956 NETLINK_CB(cb->skb).portid,
4957 cb->nlh->nlmsg_seq);
4958- if (err && err != -EOPNOTSUPP) {
4959+ if (err == -EOPNOTSUPP) {
4960+ err = 0;
4961+ } else if (err) {
4962 mutex_unlock(&devlink->lock);
4963 goto out;
4964 }
4965@@ -1283,12 +2263,17 @@ static int devlink_sb_port_pool_set(struct devlink_port *devlink_port,
4966 static int devlink_nl_cmd_sb_port_pool_set_doit(struct sk_buff *skb,
4967 struct genl_info *info)
4968 {
4969- struct devlink_port *devlink_port = info->user_ptr[0];
4970- struct devlink_sb *devlink_sb = info->user_ptr[1];
4971+ struct devlink_port *devlink_port = info->user_ptr[1];
4972+ struct devlink *devlink = info->user_ptr[0];
4973+ struct devlink_sb *devlink_sb;
4974 u16 pool_index;
4975 u32 threshold;
4976 int err;
4977
4978+ devlink_sb = devlink_sb_get_from_info(devlink, info);
4979+ if (IS_ERR(devlink_sb))
4980+ return PTR_ERR(devlink_sb);
4981+
4982 err = devlink_sb_pool_index_get_from_info(devlink_sb, info,
4983 &pool_index);
4984 if (err)
4985@@ -1370,14 +2355,18 @@ nla_put_failure:
4986 static int devlink_nl_cmd_sb_tc_pool_bind_get_doit(struct sk_buff *skb,
4987 struct genl_info *info)
4988 {
4989- struct devlink_port *devlink_port = info->user_ptr[0];
4990+ struct devlink_port *devlink_port = info->user_ptr[1];
4991 struct devlink *devlink = devlink_port->devlink;
4992- struct devlink_sb *devlink_sb = info->user_ptr[1];
4993+ struct devlink_sb *devlink_sb;
4994 struct sk_buff *msg;
4995 enum devlink_sb_pool_type pool_type;
4996 u16 tc_index;
4997 int err;
4998
4999+ devlink_sb = devlink_sb_get_from_info(devlink, info);
5000+ if (IS_ERR(devlink_sb))
The diff has been truncated for viewing.

Subscribers

People subscribed via source and target branches

to all changes: