~bodong-wang/ubuntu/+source/linux-bluefield:nvidia-devlink

Last commit made on 2022-03-01
Get this branch:
git clone -b nvidia-devlink https://git.launchpad.net/~bodong-wang/ubuntu/+source/linux-bluefield
Only Bodong Wang can upload to this branch. If you are Bodong Wang please log in for upload directions.

Branch merges

Branch information

Name:
nvidia-devlink
Repository:
lp:~bodong-wang/ubuntu/+source/linux-bluefield

Recent commits

e1f2032... by Moshe Shemesh <email address hidden>

devlink: Fix reload stats structure

BugLink: https://bugs.launchpad.net/bugs/1962490

Fix reload stats structure exposed to the user. Change stats structure
hierarchy to have the reload action as a parent of the stat entry and
then stat entry includes value per limit. This will also help to avoid
string concatenation on iproute2 output.

Reload stats structure before this fix:
"stats": {
    "reload": {
        "driver_reinit": 2,
        "fw_activate": 1,
        "fw_activate_no_reset": 0
     }
}

After this fix:
"stats": {
    "reload": {
        "driver_reinit": {
            "unspecified": 2
        },
        "fw_activate": {
            "unspecified": 1,
            "no_reset": 0
        }
}

Fixes: a254c264267e ("devlink: Add reload stats")
Signed-off-by: Moshe Shemesh <email address hidden>
Reviewed-by: Jiri Pirko <email address hidden>
Link: https://<email address hidden>
Signed-off-by: Jakub Kicinski <email address hidden>
(cherry picked from commit 5204bb683c1633e550c2124ccc2358dd645a80db)
Signed-off-by: Bodong Wang <email address hidden>

c8269a0... by Vladyslav Tarasiuk <email address hidden>

devlink: Rework devlink health reporter destructor

BugLink: https://bugs.launchpad.net/bugs/1962490

Devlink keeps its own reference to every reporter in a list and inits
refcount to 1 upon reporter's creation. Existing destructor waits to
free the memory indefinitely using msleep() until all references except
devlink's own are put.

Rework this mechanism by moving memory free routine to a separate
function, which is called when the last reporter reference is put.

Besides, it allows to call __devlink_health_reporter_destroy() while
locked on a reporters list mutex in symmetry to
__devlink_health_reporter_create(), which is required in follow-up
patch.

Signed-off-by: Vladyslav Tarasiuk <email address hidden>
Reviewed-by: Moshe Shemesh <email address hidden>
Reviewed-by: Jiri Pirko <email address hidden>
Signed-off-by: David S. Miller <email address hidden>
(cherry picked from commit 3c5584bf0a0493e8d232ade65f4b9c5e995f3a0c)
Signed-off-by: Bodong Wang <email address hidden>

7c1c03e... by Ido Schimmel <email address hidden>

drop_monitor: Convert to using devlink tracepoint

BugLink: https://bugs.launchpad.net/bugs/1962490

Convert drop monitor to use the recently introduced
'devlink_trap_report' tracepoint instead of having devlink call into
drop monitor.

This is both consistent with software originated drops ('kfree_skb'
tracepoint) and also allows drop monitor to be built as a module and
still report hardware originated drops.

Signed-off-by: Ido Schimmel <email address hidden>
Reviewed-by: Jiri Pirko <email address hidden>
Signed-off-by: David S. Miller <email address hidden>
(backported from commit 8ee2267ad33e0ba021e9dd9b437f773906cd99d6)
Signed-off-by: Bodong Wang <email address hidden>
[bodong: ignore doc and drop monitor]

Conflicts:
 MAINTAINERS
 include/net/drop_monitor.h

86e7116... by Ido Schimmel <email address hidden>

devlink: Add 'control' trap type

BugLink: https://bugs.launchpad.net/bugs/1962490

This type is used for traps that trap control packets such as ARP
request and IGMP query to the CPU.

Do not report such packets to the kernel's drop monitor as they were not
dropped by the device no encountered an exception during forwarding.

Signed-off-by: Ido Schimmel <email address hidden>
Reviewed-by: Jiri Pirko <email address hidden>
Signed-off-by: David S. Miller <email address hidden>
(cherry picked from commit 30a4e9a29ab9aadfe6c5386ae4aa396b1d2556c2)
Signed-off-by: Bodong Wang <email address hidden>

2cf5515... by Leon Romanovsky <email address hidden>

devlink: Remove check of always valid devlink pointer

BugLink: https://bugs.launchpad.net/bugs/1962490

Devlink objects are accessible only after they were registered and
have valid devlink_*->devlink pointers.

Remove that check and simplify respective fill functions as an outcome
of such change.

Signed-off-by: Leon Romanovsky <email address hidden>
Signed-off-by: David S. Miller <email address hidden>
(backported from commit 7ca973dc9fe589dc0ab2650641f4c7a19cc49ecd)
Signed-off-by: Bodong Wang <email address hidden>
[bodong: fix conflict due to api change]

Conflicts:
 net/core/devlink.c

777e34d... by Jakub Kicinski <email address hidden>

devlink: factor out building a snapshot notification

BugLink: https://bugs.launchpad.net/bugs/1962490

We'll need to send snapshot info back on the socket
which requested a snapshot to be created. Factor out
constructing a snapshot description from the broadcast
notification code.

v3: new patch

Signed-off-by: Jakub Kicinski <email address hidden>
Reviewed-by: Jiri Pirko <email address hidden>
Reviewed-by: Jacob Keller <email address hidden>
Signed-off-by: David S. Miller <email address hidden>
(cherry picked from commit dd86fec7e06ab792fe470c66a67ff42bf5d72b91)
Signed-off-by: Bodong Wang <email address hidden>

24d0d78... by Leon Romanovsky <email address hidden>

devlink: Allocate devlink directly in requested net namespace

BugLink: https://bugs.launchpad.net/bugs/1962490

There is no need in extra call indirection and check from impossible
flow where someone tries to set namespace without prior call
to devlink_alloc().

Instead of this extra logic and additional EXPORT_SYMBOL, use specialized
devlink allocation function that receives net namespace as an argument.

Such specialized API allows clear view when devlink initialized in wrong
net namespace and/or kernel users don't try to change devlink namespace
under the hood.

Reviewed-by: Jiri Pirko <email address hidden>
Signed-off-by: Leon Romanovsky <email address hidden>
Signed-off-by: Jakub Kicinski <email address hidden>
(backported from commit 26713455048eb19122b1561b471d30710177ef97)
Signed-off-by: Bodong Wang <email address hidden>
[bodong: ignore netdevsim]

Conflicts:
 drivers/net/netdevsim/dev.c

19bb413... by Leon Romanovsky <email address hidden>

devlink: Break parameter notification sequence to be before/after unload/load driver

BugLink: https://bugs.launchpad.net/bugs/1962490

The change of namespaces during devlink reload calls to driver unload
before it accesses devlink parameters. The commands below causes to
use-after-free bug when trying to get flow steering mode.

 * ip netns add n1
 * devlink dev reload pci/0000:00:09.0 netns n1

 ==================================================================
 BUG: KASAN: use-after-free in mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
 Read of size 4 at addr ffff888009d04308 by task devlink/275

 CPU: 6 PID: 275 Comm: devlink Not tainted 5.12.0-rc2+ #2853
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 Call Trace:
  dump_stack+0x93/0xc2
  print_address_description.constprop.0+0x18/0x140
  ? mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
  ? mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
  kasan_report.cold+0x7c/0xd8
  ? mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
  mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
  devlink_nl_param_fill+0x1c8/0xe80
  ? __free_pages_ok+0x37a/0x8a0
  ? devlink_flash_update_timeout_notify+0xd0/0xd0
  ? lock_acquire+0x1a9/0x6d0
  ? fs_reclaim_acquire+0xb7/0x160
  ? lock_is_held_type+0x98/0x110
  ? 0xffffffff81000000
  ? lock_release+0x1f9/0x6c0
  ? fs_reclaim_release+0xa1/0xf0
  ? lock_downgrade+0x6d0/0x6d0
  ? lock_is_held_type+0x98/0x110
  ? lock_is_held_type+0x98/0x110
  ? memset+0x20/0x40
  ? __build_skb_around+0x1f8/0x2b0
  devlink_param_notify+0x6d/0x180
  devlink_reload+0x1c3/0x520
  ? devlink_remote_reload_actions_performed+0x30/0x30
  ? mutex_trylock+0x24b/0x2d0
  ? devlink_nl_cmd_reload+0x62b/0x1070
  devlink_nl_cmd_reload+0x66d/0x1070
  ? devlink_reload+0x520/0x520
  ? devlink_get_from_attrs+0x1bc/0x260
  ? devlink_nl_pre_doit+0x64/0x4d0
  genl_family_rcv_msg_doit+0x1e9/0x2f0
  ? mutex_lock_io_nested+0x1130/0x1130
  ? genl_family_rcv_msg_attrs_parse.constprop.0+0x240/0x240
  ? security_capable+0x51/0x90
  genl_rcv_msg+0x27f/0x4a0
  ? genl_get_cmd+0x3c0/0x3c0
  ? lock_acquire+0x1a9/0x6d0
  ? devlink_reload+0x520/0x520
  ? lock_release+0x6c0/0x6c0
  netlink_rcv_skb+0x11d/0x340
  ? genl_get_cmd+0x3c0/0x3c0
  ? netlink_ack+0x9f0/0x9f0
  ? lock_release+0x1f9/0x6c0
  genl_rcv+0x24/0x40
  netlink_unicast+0x433/0x700
  ? netlink_attachskb+0x730/0x730
  ? _copy_from_iter_full+0x178/0x650
  ? __alloc_skb+0x113/0x2b0
  netlink_sendmsg+0x6f1/0xbd0
  ? netlink_unicast+0x700/0x700
  ? lock_is_held_type+0x98/0x110
  ? netlink_unicast+0x700/0x700
  sock_sendmsg+0xb0/0xe0
  __sys_sendto+0x193/0x240
  ? __x64_sys_getpeername+0xb0/0xb0
  ? do_sys_openat2+0x10b/0x370
  ? __up_read+0x1a1/0x7b0
  ? do_user_addr_fault+0x219/0xdc0
  ? __x64_sys_openat+0x120/0x1d0
  ? __x64_sys_open+0x1a0/0x1a0
  __x64_sys_sendto+0xdd/0x1b0
  ? syscall_enter_from_user_mode+0x1d/0x50
  do_syscall_64+0x2d/0x40
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7fc69d0af14a
 Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 c3 0f 1f 44 00 00 55 48 83 ec 30 44 89 4c
 RSP: 002b:00007ffc1d8292f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
 RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fc69d0af14a
 RDX: 0000000000000038 RSI: 0000555f57c56440 RDI: 0000000000000003
 RBP: 0000555f57c56410 R08: 00007fc69d17b200 R09: 000000000000000c
 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

 Allocated by task 146:
  kasan_save_stack+0x1b/0x40
  __kasan_kmalloc+0x99/0xc0
  mlx5_init_fs+0xf0/0x1c50 [mlx5_core]
  mlx5_load+0xd2/0x180 [mlx5_core]
  mlx5_init_one+0x2f6/0x450 [mlx5_core]
  probe_one+0x47d/0x6e0 [mlx5_core]
  pci_device_probe+0x2a0/0x4a0
  really_probe+0x20a/0xc90
  driver_probe_device+0xd8/0x380
  device_driver_attach+0x1df/0x250
  __driver_attach+0xff/0x240
  bus_for_each_dev+0x11e/0x1a0
  bus_add_driver+0x309/0x570
  driver_register+0x1ee/0x380
  0xffffffffa06b8062
  do_one_initcall+0xd5/0x410
  do_init_module+0x1c8/0x760
  load_module+0x6d8b/0x9650
  __do_sys_finit_module+0x118/0x1b0
  do_syscall_64+0x2d/0x40
  entry_SYSCALL_64_after_hwframe+0x44/0xae

 Freed by task 275:
  kasan_save_stack+0x1b/0x40
  kasan_set_track+0x1c/0x30
  kasan_set_free_info+0x20/0x30
  __kasan_slab_free+0x102/0x140
  slab_free_freelist_hook+0x74/0x1b0
  kfree+0xd7/0x2a0
  mlx5_unload+0x16/0xb0 [mlx5_core]
  mlx5_unload_one+0xae/0x120 [mlx5_core]
  mlx5_devlink_reload_down+0x1bc/0x380 [mlx5_core]
  devlink_reload+0x141/0x520
  devlink_nl_cmd_reload+0x66d/0x1070
  genl_family_rcv_msg_doit+0x1e9/0x2f0
  genl_rcv_msg+0x27f/0x4a0
  netlink_rcv_skb+0x11d/0x340
  genl_rcv+0x24/0x40
  netlink_unicast+0x433/0x700
  netlink_sendmsg+0x6f1/0xbd0
  sock_sendmsg+0xb0/0xe0
  __sys_sendto+0x193/0x240
  __x64_sys_sendto+0xdd/0x1b0
  do_syscall_64+0x2d/0x40
  entry_SYSCALL_64_after_hwframe+0x44/0xae

 The buggy address belongs to the object at ffff888009d04300
  which belongs to the cache kmalloc-128 of size 128
 The buggy address is located 8 bytes inside of
  128-byte region [ffff888009d04300, ffff888009d04380)
 The buggy address belongs to the page:
 page:0000000086a64ecc refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888009d04000 pfn:0x9d04
 head:0000000086a64ecc order:1 compound_mapcount:0
 flags: 0x4000000000010200(slab|head)
 raw: 4000000000010200 ffffea0000203980 0000000200000002 ffff8880050428c0
 raw: ffff888009d04000 000000008020001d 00000001ffffffff 0000000000000000
 page dumped because: kasan: bad access detected

 Memory state around the buggy address:
  ffff888009d04200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff888009d04280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 >ffff888009d04300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                       ^
  ffff888009d04380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ffff888009d04400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ==================================================================

The right solution to devlink reload is to notify about deletion of
parameters, unload driver, change net namespaces, load driver and notify
about addition of parameters.

Fixes: 070c63f20f6c ("net: devlink: allow to change namespaces during reload")
Reviewed-by: Parav Pandit <email address hidden>
Signed-off-by: Leon Romanovsky <email address hidden>
Signed-off-by: Jakub Kicinski <email address hidden>
(cherry picked from commit 05a7f4a8dff19999ca8a83a35ff4782689de7bfc)
Signed-off-by: Bodong Wang <email address hidden>

9468571... by Leon Romanovsky <email address hidden>

devlink: Remove duplicated registration check

BugLink: https://bugs.launchpad.net/bugs/1962490

Both registered flag and devlink pointer are set at the same time
and indicate the same thing - devlink/devlink_port are ready. Instead
of checking ->registered use devlink pointer as an indication.

Signed-off-by: Leon Romanovsky <email address hidden>
Signed-off-by: David S. Miller <email address hidden>
(backported from commit d7907a2b1a3b89bea136025f885035a083525e41)
Signed-off-by: Bodong Wang <email address hidden>
[bodong: resolve conflict due to new line]

Conflicts:
 net/core/devlink.c

61af07b... by Oleksandr Mazur <email address hidden>

net: core: devlink: add dropped stats traps field

BugLink: https://bugs.launchpad.net/bugs/1962490

Whenever query statistics is issued for trap, devlink subsystem
would also fill-in statistics 'dropped' field. This field indicates
the number of packets HW dropped and failed to report to the device driver,
and thus - to the devlink subsystem itself.
In case if device driver didn't register callback for hard drop
statistics querying, 'dropped' field will be omitted and not filled.

Signed-off-by: Oleksandr Mazur <email address hidden>
Signed-off-by: David S. Miller <email address hidden>
(cherry picked from commit ddee9dbc3d7aec1cd9fdcc671db2dd0016fd0f3d)
Signed-off-by: Bodong Wang <email address hidden>