-
Notifications
You must be signed in to change notification settings - Fork 76
SR-IOV broken for Connect-X4 in ethernet mode since UEK7.2 #20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I have confirmed that the same setup works properly with the latest RHCK kernel that comes with OL9.3 (5.14.0-362.8.1.el9_3.x86_64) |
I have re-tested this with 5.15.0-202.135.2.el9uek.x86_64, the issue is still present lspci 0e:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4] dmesg shows the following: dmesg |grep mlx [ 1.364318] mlx5_core 0000:0e:00.0: firmware version: 12.28.2006 |
This issue seemed to be caused by the combination of an older FW (doesn't support querying hca_cap_2 bit) and a newer upstream mlx5 driver. Hence the failure of FW CMD: QUERY_HCA_CAP(0x100) was seen in the log. If you can update the firmware that would be best. Alternatively wait until we have a kernel with Thx! |
[ Upstream commit e3e82fc ] When creating ceq_0 during probing irdma, cqp.sc_cqp will be sent as a cqp_request to cqp->sc_cqp.sq_ring. If the request is pending when removing the irdma driver or unplugging its aux device, cqp.sc_cqp will be dereferenced as wrong struct in irdma_free_pending_cqp_request(). PID: 3669 TASK: ffff88aef892c000 CPU: 28 COMMAND: "kworker/28:0" #0 [fffffe0000549e38] crash_nmi_callback at ffffffff810e3a34 #1 [fffffe0000549e40] nmi_handle at ffffffff810788b2 #2 [fffffe0000549ea0] default_do_nmi at ffffffff8107938f #3 [fffffe0000549eb8] do_nmi at ffffffff81079582 #4 [fffffe0000549ef0] end_repeat_nmi at ffffffff82e016b4 [exception RIP: native_queued_spin_lock_slowpath+1291] RIP: ffffffff8127e72b RSP: ffff88aa841ef778 RFLAGS: 00000046 RAX: 0000000000000000 RBX: ffff88b01f849700 RCX: ffffffff8127e47e RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff83857ec0 RBP: ffff88afe3e4efc8 R8: ffffed15fc7c9dfa R9: ffffed15fc7c9dfa R10: 0000000000000001 R11: ffffed15fc7c9df9 R12: 0000000000740000 R13: ffff88b01f849708 R14: 0000000000000003 R15: ffffed1603f092e1 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 -- <NMI exception stack> -- #5 [ffff88aa841ef778] native_queued_spin_lock_slowpath at ffffffff8127e72b #6 [ffff88aa841ef7b0] _raw_spin_lock_irqsave at ffffffff82c22aa4 #7 [ffff88aa841ef7c8] __wake_up_common_lock at ffffffff81257363 #8 [ffff88aa841ef888] irdma_free_pending_cqp_request at ffffffffa0ba12cc [irdma] #9 [ffff88aa841ef958] irdma_cleanup_pending_cqp_op at ffffffffa0ba1469 [irdma] #10 [ffff88aa841ef9c0] irdma_ctrl_deinit_hw at ffffffffa0b2989f [irdma] #11 [ffff88aa841efa28] irdma_remove at ffffffffa0b252df [irdma] #12 [ffff88aa841efae8] auxiliary_bus_remove at ffffffff8219afdb #13 [ffff88aa841efb00] device_release_driver_internal at ffffffff821882e6 #14 [ffff88aa841efb38] bus_remove_device at ffffffff82184278 #15 [ffff88aa841efb88] device_del at ffffffff82179d23 #16 [ffff88aa841efc48] ice_unplug_aux_dev at ffffffffa0eb1c14 [ice] #17 [ffff88aa841efc68] ice_service_task at ffffffffa0d88201 [ice] #18 [ffff88aa841efde8] process_one_work at ffffffff811c589a #19 [ffff88aa841efe60] worker_thread at ffffffff811c71ff #20 [ffff88aa841eff10] kthread at ffffffff811d87a0 #21 [ffff88aa841eff50] ret_from_fork at ffffffff82e0022f Fixes: 44d9e52 ("RDMA/irdma: Implement device initialization definitions") Link: https://lore.kernel.org/r/20231130081415.891006-1-lishifeng@sangfor.com.cn Suggested-by: "Ismail, Mustafa" <mustafa.ismail@intel.com> Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 0511a9c56e5854958021de15967d879ceac5d821) Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Thank you for the update! |
Here is a bit of good news: [root@ol9 ~]# uname -r [root@ol9 ~]# dmesg |grep -e mlx -e eswitch [root@ol9 ~]# lspci |grep Mellanox Hopefully this info will make the inclusion of the patch above somewhat easier in the upcoming versions of the UEK kernel |
Perfect. Will close this ticket once a new UEK kernel comes out with the backport. |
Just curious if there is any progress on this issue ? The upstream kernel has this patch since July of 2023. |
We'll get this into the next possible UEK7 errata release. Sorry for the delay. |
This is currently scheduled for the May monthly errata release. |
[ Upstream commit 00d6a28 ] Commit a5a9230 (fbdev: fbcon: Properly revert changes when vc_resize() failed) started restoring old font data upon failure (of vc_resize()). But it performs so only for user fonts. It means that the "system"/internal fonts are not restored at all. So in result, the very first call to fbcon_do_set_font() performs no restore at all upon failing vc_resize(). This can be reproduced by Syzkaller to crash the system on the next invocation of font_get(). It's rather hard to hit the allocation failure in vc_resize() on the first font_set(), but not impossible. Esp. if fault injection is used to aid the execution/failure. It was demonstrated by Sirius: BUG: unable to handle page fault for address: fffffffffffffff8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD cb7b067 P4D cb7b067 PUD cb7d067 PMD 0 Oops: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 8007 Comm: poc Not tainted 6.7.0-g9d1694dc91ce #20 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:fbcon_get_font+0x229/0x800 drivers/video/fbdev/core/fbcon.c:2286 Call Trace: <TASK> con_font_get drivers/tty/vt/vt.c:4558 [inline] con_font_op+0x1fc/0xf20 drivers/tty/vt/vt.c:4673 vt_k_ioctl drivers/tty/vt/vt_ioctl.c:474 [inline] vt_ioctl+0x632/0x2ec0 drivers/tty/vt/vt_ioctl.c:752 tty_ioctl+0x6f8/0x1570 drivers/tty/tty_io.c:2803 vfs_ioctl fs/ioctl.c:51 [inline] ... So restore the font data in any case, not only for user fonts. Note the later 'if' is now protected by 'old_userfont' and not 'old_data' as the latter is always set now. (And it is supposed to be non-NULL. Otherwise we would see the bug above again.) Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Fixes: a5a9230 ("fbdev: fbcon: Properly revert changes when vc_resize() failed") Reported-and-tested-by: Ubisectech Sirius <bugreport@ubisectech.com> Cc: Ubisectech Sirius <bugreport@ubisectech.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Helge Deller <deller@gmx.de> Cc: linux-fbdev@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20240208114411.14604-1-jirislaby@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 20a4b5214f7bee13c897477168c77bbf79683c3d) Signed-off-by: Vijayendra Suman <vijayendra.suman@oracle.com>
systems, using igb driver, crash while executing poweroff command as per following call stack: crash> bt -a PID: 62583 TASK: ffff97ebbf28dc40 CPU: 0 COMMAND: "poweroff" #0 [ffffa7adcd64f8a0] machine_kexec at ffffffffa606c7c1 #1 [ffffa7adcd64f900] __crash_kexec at ffffffffa613bb52 #2 [ffffa7adcd64f9d0] panic at ffffffffa6099c45 #3 [ffffa7adcd64fa50] oops_end at ffffffffa603359a #4 [ffffa7adcd64fa78] die at ffffffffa6033c32 #5 [ffffa7adcd64faa8] do_trap at ffffffffa60309a0 #6 [ffffa7adcd64faf8] do_error_trap at ffffffffa60311e7 #7 [ffffa7adcd64fbc0] do_invalid_op at ffffffffa6031320 #8 [ffffa7adcd64fbd0] invalid_op at ffffffffa6a01f2a [exception RIP: free_msi_irqs+408] RIP: ffffffffa645d248 RSP: ffffa7adcd64fc88 RFLAGS: 00010286 RAX: ffff97eb1396fe00 RBX: 0000000000000000 RCX: ffff97eb1396fe00 RDX: ffff97eb1396fe00 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffa7adcd64fcb0 R8: 0000000000000002 R9: 000000000000fbff R10: 0000000000000000 R11: 0000000000000000 R12: ffff98c047af4720 R13: ffff97eb87cd32a0 R14: ffff97eb87cd3000 R15: ffffa7adcd64fd57 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffffa7adcd64fc80] free_msi_irqs at ffffffffa645d0fc #10 [ffffa7adcd64fcb8] pci_disable_msix at ffffffffa645d896 #11 [ffffa7adcd64fce0] igb_reset_interrupt_capability at ffffffffc024f335 [igb] #12 [ffffa7adcd64fd08] __igb_shutdown at ffffffffc0258ed7 [igb] #13 [ffffa7adcd64fd48] igb_shutdown at ffffffffc025908b [igb] #14 [ffffa7adcd64fd70] pci_device_shutdown at ffffffffa6441e3a #15 [ffffa7adcd64fd98] device_shutdown at ffffffffa6570260 #16 [ffffa7adcd64fdc8] kernel_power_off at ffffffffa60c0725 #17 [ffffa7adcd64fdd8] SYSC_reboot at ffffffffa60c08f1 #18 [ffffa7adcd64ff18] sys_reboot at ffffffffa60c09ee #19 [ffffa7adcd64ff28] do_syscall_64 at ffffffffa6003ca9 #20 [ffffa7adcd64ff50] entry_SYSCALL_64_after_hwframe at ffffffffa6a001b1 This happens because igb_shutdown has not yet freed up allocated irqs and free_msi_irqs finds irq_has_action true for involved msi irqs here and this condition triggers BUG_ON. Freeing irqs before proceeding further in igb_clear_interrupt_scheme, fixes this problem. This issue does not happen in v5.17 or later kernel versions because 'commit 9fb9eb4 ("PCI/MSI: Let core code free MSI descriptors")', explicitly frees up MSI based irqs and hence indirectly fixes this issue as well. But this change is dependent on framework for runtime extension of MSI-X irqs and including these changes will break the kABI because it changes some core data structures like pci_device, device and others. So in kernels prior to v5.17 we need to have this change to fix this issue, without breaking the kABI. Orabug: 36547249 Signed-off-by: Imran Khan <imran.f.khan@oracle.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Hello, Just a quick update: $ lspci |grep Mellanox $ dmesg |grep mlx5_core |
Hi, it was actually fixed back in https://github.com/oracle/linux-uek/commits/v5.15.0-206.149.3, but I had been waiting to update this issue until we published the RPMs. They also appear to be available now. Thank you for your patience. |
systems, using igb driver, crash while executing poweroff command as per following call stack: crash> bt -a PID: 62583 TASK: ffff97ebbf28dc40 CPU: 0 COMMAND: "poweroff" #0 [ffffa7adcd64f8a0] machine_kexec at ffffffffa606c7c1 #1 [ffffa7adcd64f900] __crash_kexec at ffffffffa613bb52 #2 [ffffa7adcd64f9d0] panic at ffffffffa6099c45 #3 [ffffa7adcd64fa50] oops_end at ffffffffa603359a #4 [ffffa7adcd64fa78] die at ffffffffa6033c32 #5 [ffffa7adcd64faa8] do_trap at ffffffffa60309a0 #6 [ffffa7adcd64faf8] do_error_trap at ffffffffa60311e7 #7 [ffffa7adcd64fbc0] do_invalid_op at ffffffffa6031320 #8 [ffffa7adcd64fbd0] invalid_op at ffffffffa6a01f2a [exception RIP: free_msi_irqs+408] RIP: ffffffffa645d248 RSP: ffffa7adcd64fc88 RFLAGS: 00010286 RAX: ffff97eb1396fe00 RBX: 0000000000000000 RCX: ffff97eb1396fe00 RDX: ffff97eb1396fe00 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffa7adcd64fcb0 R8: 0000000000000002 R9: 000000000000fbff R10: 0000000000000000 R11: 0000000000000000 R12: ffff98c047af4720 R13: ffff97eb87cd32a0 R14: ffff97eb87cd3000 R15: ffffa7adcd64fd57 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffffa7adcd64fc80] free_msi_irqs at ffffffffa645d0fc #10 [ffffa7adcd64fcb8] pci_disable_msix at ffffffffa645d896 #11 [ffffa7adcd64fce0] igb_reset_interrupt_capability at ffffffffc024f335 [igb] #12 [ffffa7adcd64fd08] __igb_shutdown at ffffffffc0258ed7 [igb] #13 [ffffa7adcd64fd48] igb_shutdown at ffffffffc025908b [igb] #14 [ffffa7adcd64fd70] pci_device_shutdown at ffffffffa6441e3a #15 [ffffa7adcd64fd98] device_shutdown at ffffffffa6570260 #16 [ffffa7adcd64fdc8] kernel_power_off at ffffffffa60c0725 #17 [ffffa7adcd64fdd8] SYSC_reboot at ffffffffa60c08f1 #18 [ffffa7adcd64ff18] sys_reboot at ffffffffa60c09ee #19 [ffffa7adcd64ff28] do_syscall_64 at ffffffffa6003ca9 #20 [ffffa7adcd64ff50] entry_SYSCALL_64_after_hwframe at ffffffffa6a001b1 This happens because igb_shutdown has not yet freed up allocated irqs and free_msi_irqs finds irq_has_action true for involved msi irqs here and this condition triggers BUG_ON. Freeing irqs before proceeding further in igb_clear_interrupt_scheme, fixes this problem. This issue does not happen in v5.17 or later kernel versions because 'commit 9fb9eb4 ("PCI/MSI: Let core code free MSI descriptors")', explicitly frees up MSI based irqs and hence indirectly fixes this issue as well. But this change is dependent on framework for runtime extension of MSI-X irqs and including these changes will break the kABI because it changes some core data structures like pci_device, device and others. So in kernels prior to v5.17 we need to have this change to fix this issue, without breaking the kABI. Orabug: 36547250 Signed-off-by: Imran Khan <imran.f.khan@oracle.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Sherry Yang <sherry.yang@oracle.com>
systems, using igb driver, crash while executing poweroff command as per following call stack: crash> bt -a PID: 62583 TASK: ffff97ebbf28dc40 CPU: 0 COMMAND: "poweroff" #0 [ffffa7adcd64f8a0] machine_kexec at ffffffffa606c7c1 #1 [ffffa7adcd64f900] __crash_kexec at ffffffffa613bb52 #2 [ffffa7adcd64f9d0] panic at ffffffffa6099c45 #3 [ffffa7adcd64fa50] oops_end at ffffffffa603359a #4 [ffffa7adcd64fa78] die at ffffffffa6033c32 #5 [ffffa7adcd64faa8] do_trap at ffffffffa60309a0 #6 [ffffa7adcd64faf8] do_error_trap at ffffffffa60311e7 #7 [ffffa7adcd64fbc0] do_invalid_op at ffffffffa6031320 #8 [ffffa7adcd64fbd0] invalid_op at ffffffffa6a01f2a [exception RIP: free_msi_irqs+408] RIP: ffffffffa645d248 RSP: ffffa7adcd64fc88 RFLAGS: 00010286 RAX: ffff97eb1396fe00 RBX: 0000000000000000 RCX: ffff97eb1396fe00 RDX: ffff97eb1396fe00 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffa7adcd64fcb0 R8: 0000000000000002 R9: 000000000000fbff R10: 0000000000000000 R11: 0000000000000000 R12: ffff98c047af4720 R13: ffff97eb87cd32a0 R14: ffff97eb87cd3000 R15: ffffa7adcd64fd57 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffffa7adcd64fc80] free_msi_irqs at ffffffffa645d0fc #10 [ffffa7adcd64fcb8] pci_disable_msix at ffffffffa645d896 #11 [ffffa7adcd64fce0] igb_reset_interrupt_capability at ffffffffc024f335 [igb] #12 [ffffa7adcd64fd08] __igb_shutdown at ffffffffc0258ed7 [igb] #13 [ffffa7adcd64fd48] igb_shutdown at ffffffffc025908b [igb] #14 [ffffa7adcd64fd70] pci_device_shutdown at ffffffffa6441e3a #15 [ffffa7adcd64fd98] device_shutdown at ffffffffa6570260 #16 [ffffa7adcd64fdc8] kernel_power_off at ffffffffa60c0725 #17 [ffffa7adcd64fdd8] SYSC_reboot at ffffffffa60c08f1 #18 [ffffa7adcd64ff18] sys_reboot at ffffffffa60c09ee #19 [ffffa7adcd64ff28] do_syscall_64 at ffffffffa6003ca9 #20 [ffffa7adcd64ff50] entry_SYSCALL_64_after_hwframe at ffffffffa6a001b1 This happens because igb_shutdown has not yet freed up allocated irqs and free_msi_irqs finds irq_has_action true for involved msi irqs here and this condition triggers BUG_ON. Freeing irqs before proceeding further in igb_clear_interrupt_scheme, fixes this problem. This issue does not happen in v5.17 or later kernel versions because 'commit 9fb9eb4 ("PCI/MSI: Let core code free MSI descriptors")', explicitly frees up MSI based irqs and hence indirectly fixes this issue as well. But this change is dependent on framework for runtime extension of MSI-X irqs and including these changes will break the kABI because it changes some core data structures like pci_device, device and others. So in kernels prior to v5.17 we need to have this change to fix this issue, without breaking the kABI. Orabug: 36547251 Signed-off-by: Imran Khan <imran.f.khan@oracle.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
The cited commit adds a compeletion to remove dependency on rtnl lock. But it causes a deadlock for multiple encapsulations: crash> bt ffff8aece8a64000 PID: 1514557 TASK: ffff8aece8a64000 CPU: 3 COMMAND: "tc" #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418 #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898 #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8 #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core] #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core] #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core] #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core] #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core] #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core] #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core] #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower] #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower] #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c crash> bt 0xffff8aeb07544000 PID: 1110766 TASK: ffff8aeb07544000 CPU: 0 COMMAND: "kworker/u20:9" #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418 #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88 #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core] #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core] #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core] #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012 #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f After the first encap is attached, flow will be added to encap entry's flows list. If neigh update is running at this time, the following encaps of the flow can't hold the encap_tbl_lock and sleep. If neigh update thread is waiting for that flow's init_done, deadlock happens. Fix it by holding lock outside of the for loop. If neigh update is running, prevent encap flows from offloading. Since the lock is held outside of the for loop, concurrent creation of encap entries is not allowed. So remove unnecessary wait_for_completion call for res_ready. Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Orabug: 35383105 (cherry picked from commit 37c3b9f) cherry-pick-repo=kernel/git/torvalds/linux.git unmodified-from-upstream: 37c3b9f Signed-off-by: Mikhael Goikhman <migo@nvidia.com> Signed-off-by: Qing Huang <qing.huang@oracle.com> Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
The cited commit adds a compeletion to remove dependency on rtnl lock. But it causes a deadlock for multiple encapsulations: crash> bt ffff8aece8a64000 PID: 1514557 TASK: ffff8aece8a64000 CPU: 3 COMMAND: "tc" #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418 #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898 #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8 #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core] #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core] #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core] #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core] #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core] #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core] #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core] #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower] #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower] #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c crash> bt 0xffff8aeb07544000 PID: 1110766 TASK: ffff8aeb07544000 CPU: 0 COMMAND: "kworker/u20:9" #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418 #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88 #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core] #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core] #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core] #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012 #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f After the first encap is attached, flow will be added to encap entry's flows list. If neigh update is running at this time, the following encaps of the flow can't hold the encap_tbl_lock and sleep. If neigh update thread is waiting for that flow's init_done, deadlock happens. Fix it by holding lock outside of the for loop. If neigh update is running, prevent encap flows from offloading. Since the lock is held outside of the for loop, concurrent creation of encap entries is not allowed. So remove unnecessary wait_for_completion call for res_ready. Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Orabug: 35383105 (cherry picked from commit 37c3b9f) cherry-pick-repo=kernel/git/torvalds/linux.git unmodified-from-upstream: 37c3b9f Signed-off-by: Mikhael Goikhman <migo@nvidia.com> Signed-off-by: Qing Huang <qing.huang@oracle.com> Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
commit bab4923 upstream. In the TRACE_EVENT(qdisc_reset) NULL dereference occurred from qdisc->dev_queue->dev <NULL> ->name This situation simulated from bunch of veths and Bluetooth disconnection and reconnection. During qdisc initialization, qdisc was being set to noop_queue. In veth_init_queue, the initial tx_num was reduced back to one, causing the qdisc reset to be called with noop, which led to the kernel panic. I've attached the GitHub gist link that C converted syz-execprogram source code and 3 log of reproduced vmcore-dmesg. https://gist.github.com/yskelg/cc64562873ce249cdd0d5a358b77d740 Yeoreum and I use two fuzzing tool simultaneously. One process with syz-executor : https://github.com/google/syzkaller $ ./syz-execprog -executor=./syz-executor -repeat=1 -sandbox=setuid \ -enable=none -collide=false log1 The other process with perf fuzzer: https://github.com/deater/perf_event_tests/tree/master/fuzzer $ perf_event_tests/fuzzer/perf_fuzzer I think this will happen on the kernel version. Linux kernel version +v6.7.10, +v6.8, +v6.9 and it could happen in v6.10. This occurred from 51270d5. I think this patch is absolutely necessary. Previously, It was showing not intended string value of name. I've reproduced 3 time from my fedora 40 Debug Kernel with any other module or patched. version: 6.10.0-0.rc2.20240608gitdc772f8237f9.29.fc41.aarch64+debug [ 5287.164555] veth0_vlan: left promiscuous mode [ 5287.164929] veth1_macvtap: left promiscuous mode [ 5287.164950] veth0_macvtap: left promiscuous mode [ 5287.164983] veth1_vlan: left promiscuous mode [ 5287.165008] veth0_vlan: left promiscuous mode [ 5287.165450] veth1_macvtap: left promiscuous mode [ 5287.165472] veth0_macvtap: left promiscuous mode [ 5287.165502] veth1_vlan: left promiscuous mode … [ 5297.598240] bridge0: port 2(bridge_slave_1) entered blocking state [ 5297.598262] bridge0: port 2(bridge_slave_1) entered forwarding state [ 5297.598296] bridge0: port 1(bridge_slave_0) entered blocking state [ 5297.598313] bridge0: port 1(bridge_slave_0) entered forwarding state [ 5297.616090] 8021q: adding VLAN 0 to HW filter on device bond0 [ 5297.620405] bridge0: port 1(bridge_slave_0) entered disabled state [ 5297.620730] bridge0: port 2(bridge_slave_1) entered disabled state [ 5297.627247] 8021q: adding VLAN 0 to HW filter on device team0 [ 5297.629636] bridge0: port 1(bridge_slave_0) entered blocking state … [ 5298.002798] bridge_slave_0: left promiscuous mode [ 5298.002869] bridge0: port 1(bridge_slave_0) entered disabled state [ 5298.309444] bond0 (unregistering): (slave bond_slave_0): Releasing backup interface [ 5298.315206] bond0 (unregistering): (slave bond_slave_1): Releasing backup interface [ 5298.320207] bond0 (unregistering): Released all slaves [ 5298.354296] hsr_slave_0: left promiscuous mode [ 5298.360750] hsr_slave_1: left promiscuous mode [ 5298.374889] veth1_macvtap: left promiscuous mode [ 5298.374931] veth0_macvtap: left promiscuous mode [ 5298.374988] veth1_vlan: left promiscuous mode [ 5298.375024] veth0_vlan: left promiscuous mode [ 5299.109741] team0 (unregistering): Port device team_slave_1 removed [ 5299.185870] team0 (unregistering): Port device team_slave_0 removed … [ 5300.155443] Bluetooth: hci3: unexpected cc 0x0c03 length: 249 > 1 [ 5300.155724] Bluetooth: hci3: unexpected cc 0x1003 length: 249 > 9 [ 5300.155988] Bluetooth: hci3: unexpected cc 0x1001 length: 249 > 9 …. [ 5301.075531] team0: Port device team_slave_1 added [ 5301.085515] bridge0: port 1(bridge_slave_0) entered blocking state [ 5301.085531] bridge0: port 1(bridge_slave_0) entered disabled state [ 5301.085588] bridge_slave_0: entered allmulticast mode [ 5301.085800] bridge_slave_0: entered promiscuous mode [ 5301.095617] bridge0: port 1(bridge_slave_0) entered blocking state [ 5301.095633] bridge0: port 1(bridge_slave_0) entered disabled state … [ 5301.149734] bond0: (slave bond_slave_0): Enslaving as an active interface with an up link [ 5301.173234] bond0: (slave bond_slave_0): Enslaving as an active interface with an up link [ 5301.180517] bond0: (slave bond_slave_1): Enslaving as an active interface with an up link [ 5301.193481] hsr_slave_0: entered promiscuous mode [ 5301.204425] hsr_slave_1: entered promiscuous mode [ 5301.210172] debugfs: Directory 'hsr0' with parent 'hsr' already present! [ 5301.210185] Cannot create hsr debugfs directory [ 5301.224061] bond0: (slave bond_slave_1): Enslaving as an active interface with an up link [ 5301.246901] bond0: (slave bond_slave_0): Enslaving as an active interface with an up link [ 5301.255934] team0: Port device team_slave_0 added [ 5301.256480] team0: Port device team_slave_1 added [ 5301.256948] team0: Port device team_slave_0 added … [ 5301.435928] hsr_slave_0: entered promiscuous mode [ 5301.446029] hsr_slave_1: entered promiscuous mode [ 5301.455872] debugfs: Directory 'hsr0' with parent 'hsr' already present! [ 5301.455884] Cannot create hsr debugfs directory [ 5301.502664] hsr_slave_0: entered promiscuous mode [ 5301.513675] hsr_slave_1: entered promiscuous mode [ 5301.526155] debugfs: Directory 'hsr0' with parent 'hsr' already present! [ 5301.526164] Cannot create hsr debugfs directory [ 5301.563662] hsr_slave_0: entered promiscuous mode [ 5301.576129] hsr_slave_1: entered promiscuous mode [ 5301.580259] debugfs: Directory 'hsr0' with parent 'hsr' already present! [ 5301.580270] Cannot create hsr debugfs directory [ 5301.590269] 8021q: adding VLAN 0 to HW filter on device bond0 [ 5301.595872] KASAN: null-ptr-deref in range [0x0000000000000130-0x0000000000000137] [ 5301.595877] Mem abort info: [ 5301.595881] ESR = 0x0000000096000006 [ 5301.595885] EC = 0x25: DABT (current EL), IL = 32 bits [ 5301.595889] SET = 0, FnV = 0 [ 5301.595893] EA = 0, S1PTW = 0 [ 5301.595896] FSC = 0x06: level 2 translation fault [ 5301.595900] Data abort info: [ 5301.595903] ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000 [ 5301.595907] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 5301.595911] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 5301.595915] [dfff800000000026] address between user and kernel address ranges [ 5301.595971] Internal error: Oops: 0000000096000006 [#1] SMP … [ 5301.596076] CPU: 2 PID: 102769 Comm: syz-executor.3 Kdump: loaded Tainted: G W ------- --- 6.10.0-0.rc2.20240608gitdc772f8237f9.29.fc41.aarch64+debug #1 [ 5301.596080] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS VMW201.00V.21805430.BA64.2305221830 05/22/2023 [ 5301.596082] pstate: 01400005 (nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 5301.596085] pc : strnlen+0x40/0x88 [ 5301.596114] lr : trace_event_get_offsets_qdisc_reset+0x6c/0x2b0 [ 5301.596124] sp : ffff8000beef6b40 [ 5301.596126] x29: ffff8000beef6b40 x28: dfff800000000000 x27: 0000000000000001 [ 5301.596131] x26: 6de1800082c62bd0 x25: 1ffff000110aa9e0 x24: ffff800088554f00 [ 5301.596136] x23: ffff800088554ec0 x22: 0000000000000130 x21: 0000000000000140 [ 5301.596140] x20: dfff800000000000 x19: ffff8000beef6c60 x18: ffff7000115106d8 [ 5301.596143] x17: ffff800121bad000 x16: ffff800080020000 x15: 0000000000000006 [ 5301.596147] x14: 0000000000000002 x13: ffff0001f3ed8d14 x12: ffff700017ddeda5 [ 5301.596151] x11: 1ffff00017ddeda4 x10: ffff700017ddeda4 x9 : ffff800082cc5eec [ 5301.596155] x8 : 0000000000000004 x7 : 00000000f1f1f1f1 x6 : 00000000f2f2f200 [ 5301.596158] x5 : 00000000f3f3f3f3 x4 : ffff700017dded80 x3 : 00000000f204f1f1 [ 5301.596162] x2 : 0000000000000026 x1 : 0000000000000000 x0 : 0000000000000130 [ 5301.596166] Call trace: [ 5301.596175] strnlen+0x40/0x88 [ 5301.596179] trace_event_get_offsets_qdisc_reset+0x6c/0x2b0 [ 5301.596182] perf_trace_qdisc_reset+0xb0/0x538 [ 5301.596184] __traceiter_qdisc_reset+0x68/0xc0 [ 5301.596188] qdisc_reset+0x43c/0x5e8 [ 5301.596190] netif_set_real_num_tx_queues+0x288/0x770 [ 5301.596194] veth_init_queues+0xfc/0x130 [veth] [ 5301.596198] veth_newlink+0x45c/0x850 [veth] [ 5301.596202] rtnl_newlink_create+0x2c8/0x798 [ 5301.596205] __rtnl_newlink+0x92c/0xb60 [ 5301.596208] rtnl_newlink+0xd8/0x130 [ 5301.596211] rtnetlink_rcv_msg+0x2e0/0x890 [ 5301.596214] netlink_rcv_skb+0x1c4/0x380 [ 5301.596225] rtnetlink_rcv+0x20/0x38 [ 5301.596227] netlink_unicast+0x3c8/0x640 [ 5301.596231] netlink_sendmsg+0x658/0xa60 [ 5301.596234] __sock_sendmsg+0xd0/0x180 [ 5301.596243] __sys_sendto+0x1c0/0x280 [ 5301.596246] __arm64_sys_sendto+0xc8/0x150 [ 5301.596249] invoke_syscall+0xdc/0x268 [ 5301.596256] el0_svc_common.constprop.0+0x16c/0x240 [ 5301.596259] do_el0_svc+0x48/0x68 [ 5301.596261] el0_svc+0x50/0x188 [ 5301.596265] el0t_64_sync_handler+0x120/0x130 [ 5301.596268] el0t_64_sync+0x194/0x198 [ 5301.596272] Code: eb15001f 54000120 d343fc02 12000801 (38f46842) [ 5301.596285] SMP: stopping secondary CPUs [ 5301.597053] Starting crashdump kernel... [ 5301.597057] Bye! After applying our patch, I didn't find any kernel panic errors. We've found a simple reproducer # echo 1 > /sys/kernel/debug/tracing/events/qdisc/qdisc_reset/enable # ip link add veth0 type veth peer name veth1 Error: Unknown device type. However, without our patch applied, I tested upstream 6.10.0-rc3 kernel using the qdisc_reset event and the ip command on my qemu virtual machine. This 2 commands makes always kernel panic. Linux version: 6.10.0-rc3 [ 0.000000] Linux version 6.10.0-rc3-00164-g44ef20baed8e-dirty (paran@fedora) (gcc (GCC) 14.1.1 20240522 (Red Hat 14.1.1-4), GNU ld version 2.41-34.fc40) #20 SMP PREEMPT Sat Jun 15 16:51:25 KST 2024 Kernel panic message: [ 615.236484] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP [ 615.237250] Dumping ftrace buffer: [ 615.237679] (ftrace buffer empty) [ 615.238097] Modules linked in: veth crct10dif_ce virtio_gpu virtio_dma_buf drm_shmem_helper drm_kms_helper zynqmp_fpga xilinx_can xilinx_spi xilinx_selectmap xilinx_core xilinx_pr_decoupler versal_fpga uvcvideo uvc videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videodev videobuf2_common mc usbnet deflate zstd ubifs ubi rcar_canfd rcar_can omap_mailbox ntb_msi_test ntb_hw_epf lattice_sysconfig_spi lattice_sysconfig ice40_spi gpio_xilinx dwmac_altr_socfpga mdio_regmap stmmac_platform stmmac pcs_xpcs dfl_fme_region dfl_fme_mgr dfl_fme_br dfl_afu dfl fpga_region fpga_bridge can can_dev br_netfilter bridge stp llc atl1c ath11k_pci mhi ath11k_ahb ath11k qmi_helpers ath10k_sdio ath10k_pci ath10k_core ath mac80211 libarc4 cfg80211 drm fuse backlight ipv6 Jun 22 02:36:5[3 6k152.62-4sm98k4-0k]v kCePUr:n e1l :P IUDn:a b4le6 8t oC ohmma: nidpl eN oketr nteali nptaedg i6n.g1 0re.0q-urecs3t- 0at0 1v6i4r-tgu4a4le fa2d0dbraeeds0se-dir tyd f#f2f08 615.252376] Hardware name: linux,dummy-virt (DT) [ 615.253220] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 615.254433] pc : strnlen+0x6c/0xe0 [ 615.255096] lr : trace_event_get_offsets_qdisc_reset+0x94/0x3d0 [ 615.256088] sp : ffff800080b269a0 [ 615.256615] x29: ffff800080b269a0 x28: ffffc070f3f98500 x27: 0000000000000001 [ 615.257831] x26: 0000000000000010 x25: ffffc070f3f98540 x24: ffffc070f619cf60 [ 615.259020] x23: 0000000000000128 x22: 0000000000000138 x21: dfff800000000000 [ 615.260241] x20: ffffc070f631ad00 x19: 0000000000000128 x18: ffffc070f448b800 [ 615.261454] x17: 0000000000000000 x16: 0000000000000001 x15: ffffc070f4ba2a90 [ 615.262635] x14: ffff700010164d73 x13: 1ffff80e1e8d5eb3 x12: 1ffff00010164d72 [ 615.263877] x11: ffff700010164d72 x10: dfff800000000000 x9 : ffffc070e85d6184 [ 615.265047] x8 : ffffc070e4402070 x7 : 000000000000f1f1 x6 : 000000001504a6d3 [ 615.266336] x5 : ffff28ca21122140 x4 : ffffc070f5043ea8 x3 : 0000000000000000 [ 615.267528] x2 : 0000000000000025 x1 : 0000000000000000 x0 : 0000000000000000 [ 615.268747] Call trace: [ 615.269180] strnlen+0x6c/0xe0 [ 615.269767] trace_event_get_offsets_qdisc_reset+0x94/0x3d0 [ 615.270716] trace_event_raw_event_qdisc_reset+0xe8/0x4e8 [ 615.271667] __traceiter_qdisc_reset+0xa0/0x140 [ 615.272499] qdisc_reset+0x554/0x848 [ 615.273134] netif_set_real_num_tx_queues+0x360/0x9a8 [ 615.274050] veth_init_queues+0x110/0x220 [veth] [ 615.275110] veth_newlink+0x538/0xa50 [veth] [ 615.276172] __rtnl_newlink+0x11e4/0x1bc8 [ 615.276944] rtnl_newlink+0xac/0x120 [ 615.277657] rtnetlink_rcv_msg+0x4e4/0x1370 [ 615.278409] netlink_rcv_skb+0x25c/0x4f0 [ 615.279122] rtnetlink_rcv+0x48/0x70 [ 615.279769] netlink_unicast+0x5a8/0x7b8 [ 615.280462] netlink_sendmsg+0xa70/0x1190 Yeoreum and I don't know if the patch we wrote will fix the underlying cause, but we think that priority is to prevent kernel panic happening. So, we're sending this patch. Fixes: 51270d5 ("tracing/net_sched: Fix tracepoints that save qdisc_dev() as a string") Link: https://lore.kernel.org/lkml/20240229143432.273b4871@gandalf.local.home/t/ Cc: netdev@vger.kernel.org Tested-by: Yunseong Kim <yskelg@gmail.com> Signed-off-by: Yunseong Kim <yskelg@gmail.com> Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Link: https://lore.kernel.org/r/20240624173320.24945-4-yskelg@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 69cdccf5372542a7ed1c8035e01a97cefcf0ca0a) FOF: 0824 Signed-off-by: Vijayendra Suman <vijayendra.suman@oracle.com>
Add a check to mlx5e_xmit() for shorter frames. A corrupted/malformed packet, with shorter length can eventually cause system panic further down in the code path. Avoid it by validating the length and dropping it at the earliest. Following is seen in our env with shorter skb->len crash> bt PID: 76981 TASK: ff19828cfe508000 CPU: 106 COMMAND: "vhost-76942" #0 [ff2d20159b39f2c8] machine_kexec at ffffffffad884801 #1 [ff2d20159b39f328] __crash_kexec at ffffffffad976142 #2 [ff2d20159b39f3f8] panic at ffffffffad8b3640 #3 [ff2d20159b39f4a0] no_context at ffffffffad8954e1 #4 [ff2d20159b39f518] __bad_area_nosemaphore at ffffffffad8958de #5 [ff2d20159b39f578] bad_area_nosemaphore at ffffffffad895a96 #6 [ff2d20159b39f588] do_kern_addr_fault at ffffffffad89688e #7 [ff2d20159b39f5b0] __do_page_fault at ffffffffad896b30 #8 [ff2d20159b39f618] do_page_fault at ffffffffad896db6 #9 [ff2d20159b39f650] page_fault at ffffffffae402acd [exception RIP: memcpy_erms+6] RIP: ffffffffae261ab6 RSP: ff2d20159b39f700 RFLAGS: 00010293 RAX: ff198291741ecf2e RBX: ff19828e70d6a100 RCX: fffffffffea1af2b RDX: fffffffffffffffd RSI: ff19828eba6d7e5e RDI: ff198291757d2000 RBP: ff2d20159b39f760 R8: ff198291741ecf00 R9: 000000000000037c R10: 000000000000003c R11: ff19828ffe953940 R12: ff198291741ecf20 R13: ff198267dcb1b600 R14: ff19828eeebb09c0 R15: ff198291741ecf00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ff2d20159b39f700] mlx5e_sq_xmit_wqe at ffffffffc05c162e [mlx5_core] #11 [ff2d20159b39f768] mlx5e_xmit at ffffffffc05c1ca3 [mlx5_core] #12 [ff2d20159b39f800] dev_hard_start_xmit at ffffffffae083766 #13 [ff2d20159b39f860] sch_direct_xmit at ffffffffae0e2564 #14 [ff2d20159b39f8b0] __qdisc_run at ffffffffae0e294e #15 [ff2d20159b39f928] __dev_queue_xmit at ffffffffae083eee #16 [ff2d20159b39f9a8] dev_queue_xmit at ffffffffae084370 #17 [ff2d20159b39f9b8] vlan_dev_hard_start_xmit at ffffffffc2fb6fec [8021q] #18 [ff2d20159b39f9d8] dev_hard_start_xmit at ffffffffae083766 #19 [ff2d20159b39fa38] __dev_queue_xmit at ffffffffae08416a #20 [ff2d20159b39fab8] dev_queue_xmit_accel at ffffffffae08438e #21 [ff2d20159b39fac8] macvlan_start_xmit at ffffffffc2fc18d9 [macvlan] #22 [ff2d20159b39faf0] dev_hard_start_xmit at ffffffffae083766 #23 [ff2d20159b39fb50] sch_direct_xmit at ffffffffae0e2564 #24 [ff2d20159b39fba0] __qdisc_run at ffffffffae0e294e #25 [ff2d20159b39fc18] __dev_queue_xmit at ffffffffae083c81 #26 [ff2d20159b39fc90] dev_queue_xmit at ffffffffae084370 #27 [ff2d20159b39fca0] tap_sendmsg at ffffffffc07206ed [tap] #28 [ff2d20159b39fd20] vhost_tx_batch at ffffffffc2fd6590 [vhost_net] #29 [ff2d20159b39fd68] handle_tx_copy at ffffffffc2fd70f3 [vhost_net] #30 [ff2d20159b39fe80] handle_tx at ffffffffc2fd7651 [vhost_net] #31 [ff2d20159b39feb0] handle_tx_kick at ffffffffc2fd76b5 [vhost_net] #32 [ff2d20159b39fec0] vhost_worker at ffffffffc12a5be8 [vhost] #33 [ff2d20159b39ff08] kthread at ffffffffad8dbfe5 #34 [ff2d20159b39ff50] ret_from_fork at ffffffffae400364 This change was discussed with Nvidia and they are in agreement. Orabug: 36879156 CVE: CVE-2024-41090 CVE: CVE-2024-41091 Fixes: e4cf27b ("net/mlx5e: Re-eanble client vlan TX acceleration") Reported-and-tested-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Add a check to mlx5e_xmit() for shorter frames. A corrupted/malformed packet, with shorter length can eventually cause system panic further down in the code path. Avoid it by validating the length and dropping it at the earliest. Following is seen in our env with shorter skb->len crash> bt PID: 76981 TASK: ff19828cfe508000 CPU: 106 COMMAND: "vhost-76942" #0 [ff2d20159b39f2c8] machine_kexec at ffffffffad884801 #1 [ff2d20159b39f328] __crash_kexec at ffffffffad976142 #2 [ff2d20159b39f3f8] panic at ffffffffad8b3640 #3 [ff2d20159b39f4a0] no_context at ffffffffad8954e1 #4 [ff2d20159b39f518] __bad_area_nosemaphore at ffffffffad8958de #5 [ff2d20159b39f578] bad_area_nosemaphore at ffffffffad895a96 #6 [ff2d20159b39f588] do_kern_addr_fault at ffffffffad89688e #7 [ff2d20159b39f5b0] __do_page_fault at ffffffffad896b30 #8 [ff2d20159b39f618] do_page_fault at ffffffffad896db6 #9 [ff2d20159b39f650] page_fault at ffffffffae402acd [exception RIP: memcpy_erms+6] RIP: ffffffffae261ab6 RSP: ff2d20159b39f700 RFLAGS: 00010293 RAX: ff198291741ecf2e RBX: ff19828e70d6a100 RCX: fffffffffea1af2b RDX: fffffffffffffffd RSI: ff19828eba6d7e5e RDI: ff198291757d2000 RBP: ff2d20159b39f760 R8: ff198291741ecf00 R9: 000000000000037c R10: 000000000000003c R11: ff19828ffe953940 R12: ff198291741ecf20 R13: ff198267dcb1b600 R14: ff19828eeebb09c0 R15: ff198291741ecf00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ff2d20159b39f700] mlx5e_sq_xmit_wqe at ffffffffc05c162e [mlx5_core] #11 [ff2d20159b39f768] mlx5e_xmit at ffffffffc05c1ca3 [mlx5_core] #12 [ff2d20159b39f800] dev_hard_start_xmit at ffffffffae083766 #13 [ff2d20159b39f860] sch_direct_xmit at ffffffffae0e2564 #14 [ff2d20159b39f8b0] __qdisc_run at ffffffffae0e294e #15 [ff2d20159b39f928] __dev_queue_xmit at ffffffffae083eee #16 [ff2d20159b39f9a8] dev_queue_xmit at ffffffffae084370 #17 [ff2d20159b39f9b8] vlan_dev_hard_start_xmit at ffffffffc2fb6fec [8021q] #18 [ff2d20159b39f9d8] dev_hard_start_xmit at ffffffffae083766 #19 [ff2d20159b39fa38] __dev_queue_xmit at ffffffffae08416a #20 [ff2d20159b39fab8] dev_queue_xmit_accel at ffffffffae08438e #21 [ff2d20159b39fac8] macvlan_start_xmit at ffffffffc2fc18d9 [macvlan] #22 [ff2d20159b39faf0] dev_hard_start_xmit at ffffffffae083766 #23 [ff2d20159b39fb50] sch_direct_xmit at ffffffffae0e2564 #24 [ff2d20159b39fba0] __qdisc_run at ffffffffae0e294e #25 [ff2d20159b39fc18] __dev_queue_xmit at ffffffffae083c81 #26 [ff2d20159b39fc90] dev_queue_xmit at ffffffffae084370 #27 [ff2d20159b39fca0] tap_sendmsg at ffffffffc07206ed [tap] #28 [ff2d20159b39fd20] vhost_tx_batch at ffffffffc2fd6590 [vhost_net] #29 [ff2d20159b39fd68] handle_tx_copy at ffffffffc2fd70f3 [vhost_net] #30 [ff2d20159b39fe80] handle_tx at ffffffffc2fd7651 [vhost_net] #31 [ff2d20159b39feb0] handle_tx_kick at ffffffffc2fd76b5 [vhost_net] #32 [ff2d20159b39fec0] vhost_worker at ffffffffc12a5be8 [vhost] #33 [ff2d20159b39ff08] kthread at ffffffffad8dbfe5 #34 [ff2d20159b39ff50] ret_from_fork at ffffffffae400364 This change was discussed with Nvidia and they are in agreement. Orabug: 36879157 CVE: CVE-2024-41090 CVE: CVE-2024-41091 Fixes: e4cf27b ("net/mlx5e: Re-eanble client vlan TX acceleration") Reported-and-tested-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com> (cherry picked from commit e7fd2c2) Signed-off-by: Sherry Yang <sherry.yang@oracle.com>
Add a check to mlx5e_xmit() for shorter frames. A corrupted/malformed packet, with shorter length can eventually cause system panic further down in the code path. Avoid it by validating the length and dropping it at the earliest. Following is seen in our env with shorter skb->len crash> bt PID: 76981 TASK: ff19828cfe508000 CPU: 106 COMMAND: "vhost-76942" #0 [ff2d20159b39f2c8] machine_kexec at ffffffffad884801 #1 [ff2d20159b39f328] __crash_kexec at ffffffffad976142 #2 [ff2d20159b39f3f8] panic at ffffffffad8b3640 #3 [ff2d20159b39f4a0] no_context at ffffffffad8954e1 #4 [ff2d20159b39f518] __bad_area_nosemaphore at ffffffffad8958de #5 [ff2d20159b39f578] bad_area_nosemaphore at ffffffffad895a96 #6 [ff2d20159b39f588] do_kern_addr_fault at ffffffffad89688e #7 [ff2d20159b39f5b0] __do_page_fault at ffffffffad896b30 #8 [ff2d20159b39f618] do_page_fault at ffffffffad896db6 #9 [ff2d20159b39f650] page_fault at ffffffffae402acd [exception RIP: memcpy_erms+6] RIP: ffffffffae261ab6 RSP: ff2d20159b39f700 RFLAGS: 00010293 RAX: ff198291741ecf2e RBX: ff19828e70d6a100 RCX: fffffffffea1af2b RDX: fffffffffffffffd RSI: ff19828eba6d7e5e RDI: ff198291757d2000 RBP: ff2d20159b39f760 R8: ff198291741ecf00 R9: 000000000000037c R10: 000000000000003c R11: ff19828ffe953940 R12: ff198291741ecf20 R13: ff198267dcb1b600 R14: ff19828eeebb09c0 R15: ff198291741ecf00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ff2d20159b39f700] mlx5e_sq_xmit_wqe at ffffffffc05c162e [mlx5_core] #11 [ff2d20159b39f768] mlx5e_xmit at ffffffffc05c1ca3 [mlx5_core] #12 [ff2d20159b39f800] dev_hard_start_xmit at ffffffffae083766 #13 [ff2d20159b39f860] sch_direct_xmit at ffffffffae0e2564 #14 [ff2d20159b39f8b0] __qdisc_run at ffffffffae0e294e #15 [ff2d20159b39f928] __dev_queue_xmit at ffffffffae083eee #16 [ff2d20159b39f9a8] dev_queue_xmit at ffffffffae084370 #17 [ff2d20159b39f9b8] vlan_dev_hard_start_xmit at ffffffffc2fb6fec [8021q] #18 [ff2d20159b39f9d8] dev_hard_start_xmit at ffffffffae083766 #19 [ff2d20159b39fa38] __dev_queue_xmit at ffffffffae08416a #20 [ff2d20159b39fab8] dev_queue_xmit_accel at ffffffffae08438e #21 [ff2d20159b39fac8] macvlan_start_xmit at ffffffffc2fc18d9 [macvlan] #22 [ff2d20159b39faf0] dev_hard_start_xmit at ffffffffae083766 #23 [ff2d20159b39fb50] sch_direct_xmit at ffffffffae0e2564 #24 [ff2d20159b39fba0] __qdisc_run at ffffffffae0e294e #25 [ff2d20159b39fc18] __dev_queue_xmit at ffffffffae083c81 #26 [ff2d20159b39fc90] dev_queue_xmit at ffffffffae084370 #27 [ff2d20159b39fca0] tap_sendmsg at ffffffffc07206ed [tap] #28 [ff2d20159b39fd20] vhost_tx_batch at ffffffffc2fd6590 [vhost_net] #29 [ff2d20159b39fd68] handle_tx_copy at ffffffffc2fd70f3 [vhost_net] #30 [ff2d20159b39fe80] handle_tx at ffffffffc2fd7651 [vhost_net] #31 [ff2d20159b39feb0] handle_tx_kick at ffffffffc2fd76b5 [vhost_net] #32 [ff2d20159b39fec0] vhost_worker at ffffffffc12a5be8 [vhost] #33 [ff2d20159b39ff08] kthread at ffffffffad8dbfe5 #34 [ff2d20159b39ff50] ret_from_fork at ffffffffae400364 This change was discussed with Nvidia and they are in agreement. Orabug: 36879158 CVE: CVE-2024-41090 CVE: CVE-2024-41091 Fixes: e4cf27b ("net/mlx5e: Re-eanble client vlan TX acceleration") Reported-and-tested-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com> Signed-off-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
Add a check to mlx5e_xmit() for shorter frames. A corrupted/malformed packet, with shorter length can eventually cause system panic further down in the code path. Avoid it by validating the length and dropping it at the earliest. Following is seen in our env with shorter skb->len crash> bt PID: 76981 TASK: ff19828cfe508000 CPU: 106 COMMAND: "vhost-76942" #0 [ff2d20159b39f2c8] machine_kexec at ffffffffad884801 #1 [ff2d20159b39f328] __crash_kexec at ffffffffad976142 #2 [ff2d20159b39f3f8] panic at ffffffffad8b3640 #3 [ff2d20159b39f4a0] no_context at ffffffffad8954e1 #4 [ff2d20159b39f518] __bad_area_nosemaphore at ffffffffad8958de #5 [ff2d20159b39f578] bad_area_nosemaphore at ffffffffad895a96 #6 [ff2d20159b39f588] do_kern_addr_fault at ffffffffad89688e #7 [ff2d20159b39f5b0] __do_page_fault at ffffffffad896b30 #8 [ff2d20159b39f618] do_page_fault at ffffffffad896db6 #9 [ff2d20159b39f650] page_fault at ffffffffae402acd [exception RIP: memcpy_erms+6] RIP: ffffffffae261ab6 RSP: ff2d20159b39f700 RFLAGS: 00010293 RAX: ff198291741ecf2e RBX: ff19828e70d6a100 RCX: fffffffffea1af2b RDX: fffffffffffffffd RSI: ff19828eba6d7e5e RDI: ff198291757d2000 RBP: ff2d20159b39f760 R8: ff198291741ecf00 R9: 000000000000037c R10: 000000000000003c R11: ff19828ffe953940 R12: ff198291741ecf20 R13: ff198267dcb1b600 R14: ff19828eeebb09c0 R15: ff198291741ecf00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ff2d20159b39f700] mlx5e_sq_xmit_wqe at ffffffffc05c162e [mlx5_core] #11 [ff2d20159b39f768] mlx5e_xmit at ffffffffc05c1ca3 [mlx5_core] #12 [ff2d20159b39f800] dev_hard_start_xmit at ffffffffae083766 #13 [ff2d20159b39f860] sch_direct_xmit at ffffffffae0e2564 #14 [ff2d20159b39f8b0] __qdisc_run at ffffffffae0e294e #15 [ff2d20159b39f928] __dev_queue_xmit at ffffffffae083eee #16 [ff2d20159b39f9a8] dev_queue_xmit at ffffffffae084370 #17 [ff2d20159b39f9b8] vlan_dev_hard_start_xmit at ffffffffc2fb6fec [8021q] #18 [ff2d20159b39f9d8] dev_hard_start_xmit at ffffffffae083766 #19 [ff2d20159b39fa38] __dev_queue_xmit at ffffffffae08416a #20 [ff2d20159b39fab8] dev_queue_xmit_accel at ffffffffae08438e #21 [ff2d20159b39fac8] macvlan_start_xmit at ffffffffc2fc18d9 [macvlan] #22 [ff2d20159b39faf0] dev_hard_start_xmit at ffffffffae083766 #23 [ff2d20159b39fb50] sch_direct_xmit at ffffffffae0e2564 #24 [ff2d20159b39fba0] __qdisc_run at ffffffffae0e294e #25 [ff2d20159b39fc18] __dev_queue_xmit at ffffffffae083c81 #26 [ff2d20159b39fc90] dev_queue_xmit at ffffffffae084370 #27 [ff2d20159b39fca0] tap_sendmsg at ffffffffc07206ed [tap] #28 [ff2d20159b39fd20] vhost_tx_batch at ffffffffc2fd6590 [vhost_net] #29 [ff2d20159b39fd68] handle_tx_copy at ffffffffc2fd70f3 [vhost_net] #30 [ff2d20159b39fe80] handle_tx at ffffffffc2fd7651 [vhost_net] #31 [ff2d20159b39feb0] handle_tx_kick at ffffffffc2fd76b5 [vhost_net] #32 [ff2d20159b39fec0] vhost_worker at ffffffffc12a5be8 [vhost] #33 [ff2d20159b39ff08] kthread at ffffffffad8dbfe5 #34 [ff2d20159b39ff50] ret_from_fork at ffffffffae400364 This change was discussed with Nvidia and they are in agreement. Orabug: 36879159 CVE: CVE-2024-41090 CVE: CVE-2024-41091 Fixes: e4cf27b ("net/mlx5e: Re-eanble client vlan TX acceleration") Reported-and-tested-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> In UEK4 stats is not a pointer, change the dropped code. Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
commit be346c1 upstream. The code in ocfs2_dio_end_io_write() estimates number of necessary transaction credits using ocfs2_calc_extend_credits(). This however does not take into account that the IO could be arbitrarily large and can contain arbitrary number of extents. Extent tree manipulations do often extend the current transaction but not in all of the cases. For example if we have only single block extents in the tree, ocfs2_mark_extent_written() will end up calling ocfs2_replace_extent_rec() all the time and we will never extend the current transaction and eventually exhaust all the transaction credits if the IO contains many single block extents. Once that happens a WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to this error. This was actually triggered by one of our customers on a heavily fragmented OCFS2 filesystem. To fix the issue make sure the transaction always has enough credits for one extent insert before each call of ocfs2_mark_extent_written(). Heming Zhao said: ------ PANIC: "Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error" PID: xxx TASK: xxxx CPU: 5 COMMAND: "SubmitThread-CA" #0 machine_kexec at ffffffff8c069932 #1 __crash_kexec at ffffffff8c1338fa #2 panic at ffffffff8c1d69b9 #3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2] #4 __ocfs2_abort at ffffffffc0c88387 [ocfs2] #5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2] #6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2] #7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2] #8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2] #9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2] #10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2] #11 dio_complete at ffffffff8c2b9fa7 #12 do_blockdev_direct_IO at ffffffff8c2bc09f #13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2] #14 generic_file_direct_write at ffffffff8c1dcf14 #15 __generic_file_write_iter at ffffffff8c1dd07b #16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2] #17 aio_write at ffffffff8c2cc72e #18 kmem_cache_alloc at ffffffff8c248dde #19 do_io_submit at ffffffff8c2ccada #20 do_syscall_64 at ffffffff8c004984 #21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba Link: https://lkml.kernel.org/r/20240617095543.6971-1-jack@suse.cz Link: https://lkml.kernel.org/r/20240614145243.8837-1-jack@suse.cz Fixes: c15471f ("ocfs2: fix sparse file & data ordering issue in direct io") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 320273b5649bbcee87f9e65343077189699d2a7a) Signed-off-by: Vijayendra Suman <vijayendra.suman@oracle.com>
[ Upstream commit db19d3a ] There is a race condition in the CMT interrupt handler. In the interrupt handler the driver sets a driver private flag, FLAG_IRQCONTEXT. This flag is used to indicate any call to set_next_event() should not be directly propagated to the device, but instead cached. This is done as the interrupt handler itself reprograms the device when needed before it completes and this avoids this operation to take place twice. It is unclear why this design was chosen, my suspicion is to allow the struct clock_event_device.event_handler callback, which is called while the FLAG_IRQCONTEXT is set, can update the next event without having to write to the device twice. Unfortunately there is a race between when the FLAG_IRQCONTEXT flag is set and later cleared where the interrupt handler have already started to write the next event to the device. If set_next_event() is called in this window the value is only cached in the driver but not written. This leads to the board to misbehave, or worse lockup and produce a splat. rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: rcu: 0-...!: (0 ticks this GP) idle=f5e0/0/0x0 softirq=519/519 fqs=0 (false positive?) rcu: (detected by 1, t=6502 jiffies, g=-595, q=77 ncpus=2) Sending NMI from CPU 1 to CPUs 0: NMI backtrace for cpu 0 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.10.0-rc5-arm64-renesas-00019-g74a6f86eaf1c-dirty #20 Hardware name: Renesas Salvator-X 2nd version board based on r8a77965 (DT) pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : tick_check_broadcast_expired+0xc/0x40 lr : cpu_idle_poll.isra.0+0x8c/0x168 sp : ffff800081c63d70 x29: ffff800081c63d70 x28: 00000000580000c8 x27: 00000000bfee5610 x26: 0000000000000027 x25: 0000000000000000 x24: 0000000000000000 x23: ffff00007fbb9100 x22: ffff8000818f1008 x21: ffff8000800ef07c x20: ffff800081c79ec0 x19: ffff800081c70c28 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffc2c717d8 x14: 0000000000000000 x13: ffff000009c18080 x12: ffff8000825f7fc0 x11: 0000000000000000 x10: ffff8000818f3cd4 x9 : 0000000000000028 x8 : ffff800081c79ec0 x7 : ffff800081c73000 x6 : 0000000000000000 x5 : 0000000000000000 x4 : ffff7ffffe286000 x3 : 0000000000000000 x2 : ffff7ffffe286000 x1 : ffff800082972900 x0 : ffff8000818f1008 Call trace: tick_check_broadcast_expired+0xc/0x40 do_idle+0x9c/0x280 cpu_startup_entry+0x34/0x40 kernel_init+0x0/0x11c do_one_initcall+0x0/0x260 __primary_switched+0x80/0x88 rcu: rcu_preempt kthread timer wakeup didn't happen for 6501 jiffies! g-595 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 rcu: Possible timer handling issue on cpu=0 timer-softirq=262 rcu: rcu_preempt kthread starved for 6502 jiffies! g-595 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0 rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. rcu: RCU grace-period kthread stack dump: task:rcu_preempt state:I stack:0 pid:15 tgid:15 ppid:2 flags:0x00000008 Call trace: __switch_to+0xbc/0x100 __schedule+0x358/0xbe0 schedule+0x48/0x148 schedule_timeout+0xc4/0x138 rcu_gp_fqs_loop+0x12c/0x764 rcu_gp_kthread+0x208/0x298 kthread+0x10c/0x110 ret_from_fork+0x10/0x20 The design have been part of the driver since it was first merged in early 2009. It becomes increasingly harder to trigger the issue the older kernel version one tries. It only takes a few boots on v6.10-rc5, while hundreds of boots are needed to trigger it on v5.10. Close the race condition by using the CMT channel lock for the two competing sections. The channel lock was added to the driver after its initial design. Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Link: https://lore.kernel.org/r/20240702190230.3825292-1-niklas.soderlund+renesas@ragnatech.se Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 4c8f911560381ed989af10b6ed7b8b98a928a3a6) Signed-off-by: Sherry Yang <sherry.yang@oracle.com>
[ Upstream commit db19d3a ] There is a race condition in the CMT interrupt handler. In the interrupt handler the driver sets a driver private flag, FLAG_IRQCONTEXT. This flag is used to indicate any call to set_next_event() should not be directly propagated to the device, but instead cached. This is done as the interrupt handler itself reprograms the device when needed before it completes and this avoids this operation to take place twice. It is unclear why this design was chosen, my suspicion is to allow the struct clock_event_device.event_handler callback, which is called while the FLAG_IRQCONTEXT is set, can update the next event without having to write to the device twice. Unfortunately there is a race between when the FLAG_IRQCONTEXT flag is set and later cleared where the interrupt handler have already started to write the next event to the device. If set_next_event() is called in this window the value is only cached in the driver but not written. This leads to the board to misbehave, or worse lockup and produce a splat. rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: rcu: 0-...!: (0 ticks this GP) idle=f5e0/0/0x0 softirq=519/519 fqs=0 (false positive?) rcu: (detected by 1, t=6502 jiffies, g=-595, q=77 ncpus=2) Sending NMI from CPU 1 to CPUs 0: NMI backtrace for cpu 0 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.10.0-rc5-arm64-renesas-00019-g74a6f86eaf1c-dirty #20 Hardware name: Renesas Salvator-X 2nd version board based on r8a77965 (DT) pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : tick_check_broadcast_expired+0xc/0x40 lr : cpu_idle_poll.isra.0+0x8c/0x168 sp : ffff800081c63d70 x29: ffff800081c63d70 x28: 00000000580000c8 x27: 00000000bfee5610 x26: 0000000000000027 x25: 0000000000000000 x24: 0000000000000000 x23: ffff00007fbb9100 x22: ffff8000818f1008 x21: ffff8000800ef07c x20: ffff800081c79ec0 x19: ffff800081c70c28 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffc2c717d8 x14: 0000000000000000 x13: ffff000009c18080 x12: ffff8000825f7fc0 x11: 0000000000000000 x10: ffff8000818f3cd4 x9 : 0000000000000028 x8 : ffff800081c79ec0 x7 : ffff800081c73000 x6 : 0000000000000000 x5 : 0000000000000000 x4 : ffff7ffffe286000 x3 : 0000000000000000 x2 : ffff7ffffe286000 x1 : ffff800082972900 x0 : ffff8000818f1008 Call trace: tick_check_broadcast_expired+0xc/0x40 do_idle+0x9c/0x280 cpu_startup_entry+0x34/0x40 kernel_init+0x0/0x11c do_one_initcall+0x0/0x260 __primary_switched+0x80/0x88 rcu: rcu_preempt kthread timer wakeup didn't happen for 6501 jiffies! g-595 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 rcu: Possible timer handling issue on cpu=0 timer-softirq=262 rcu: rcu_preempt kthread starved for 6502 jiffies! g-595 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0 rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. rcu: RCU grace-period kthread stack dump: task:rcu_preempt state:I stack:0 pid:15 tgid:15 ppid:2 flags:0x00000008 Call trace: __switch_to+0xbc/0x100 __schedule+0x358/0xbe0 schedule+0x48/0x148 schedule_timeout+0xc4/0x138 rcu_gp_fqs_loop+0x12c/0x764 rcu_gp_kthread+0x208/0x298 kthread+0x10c/0x110 ret_from_fork+0x10/0x20 The design have been part of the driver since it was first merged in early 2009. It becomes increasingly harder to trigger the issue the older kernel version one tries. It only takes a few boots on v6.10-rc5, while hundreds of boots are needed to trigger it on v5.10. Close the race condition by using the CMT channel lock for the two competing sections. The channel lock was added to the driver after its initial design. Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Link: https://lore.kernel.org/r/20240702190230.3825292-1-niklas.soderlund+renesas@ragnatech.se Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 026befb502ce41384e5119df12c9f2d4067cb23c) Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> (cherry picked from commit f9ec6971715991696e49430547551697f1153be6) Signed-off-by: Yifei Liu <yifei.l.liu@oracle.com>
Add a check to mlx5e_xmit() for shorter frames. A corrupted/malformed packet, with shorter length can eventually cause system panic further down in the code path. Avoid it by validating the length and dropping it at the earliest. Following is seen in our env with shorter skb->len crash> bt PID: 76981 TASK: ff19828cfe508000 CPU: 106 COMMAND: "vhost-76942" #0 [ff2d20159b39f2c8] machine_kexec at ffffffffad884801 #1 [ff2d20159b39f328] __crash_kexec at ffffffffad976142 #2 [ff2d20159b39f3f8] panic at ffffffffad8b3640 #3 [ff2d20159b39f4a0] no_context at ffffffffad8954e1 #4 [ff2d20159b39f518] __bad_area_nosemaphore at ffffffffad8958de #5 [ff2d20159b39f578] bad_area_nosemaphore at ffffffffad895a96 #6 [ff2d20159b39f588] do_kern_addr_fault at ffffffffad89688e #7 [ff2d20159b39f5b0] __do_page_fault at ffffffffad896b30 #8 [ff2d20159b39f618] do_page_fault at ffffffffad896db6 #9 [ff2d20159b39f650] page_fault at ffffffffae402acd [exception RIP: memcpy_erms+6] RIP: ffffffffae261ab6 RSP: ff2d20159b39f700 RFLAGS: 00010293 RAX: ff198291741ecf2e RBX: ff19828e70d6a100 RCX: fffffffffea1af2b RDX: fffffffffffffffd RSI: ff19828eba6d7e5e RDI: ff198291757d2000 RBP: ff2d20159b39f760 R8: ff198291741ecf00 R9: 000000000000037c R10: 000000000000003c R11: ff19828ffe953940 R12: ff198291741ecf20 R13: ff198267dcb1b600 R14: ff19828eeebb09c0 R15: ff198291741ecf00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ff2d20159b39f700] mlx5e_sq_xmit_wqe at ffffffffc05c162e [mlx5_core] #11 [ff2d20159b39f768] mlx5e_xmit at ffffffffc05c1ca3 [mlx5_core] #12 [ff2d20159b39f800] dev_hard_start_xmit at ffffffffae083766 #13 [ff2d20159b39f860] sch_direct_xmit at ffffffffae0e2564 #14 [ff2d20159b39f8b0] __qdisc_run at ffffffffae0e294e #15 [ff2d20159b39f928] __dev_queue_xmit at ffffffffae083eee #16 [ff2d20159b39f9a8] dev_queue_xmit at ffffffffae084370 #17 [ff2d20159b39f9b8] vlan_dev_hard_start_xmit at ffffffffc2fb6fec [8021q] #18 [ff2d20159b39f9d8] dev_hard_start_xmit at ffffffffae083766 #19 [ff2d20159b39fa38] __dev_queue_xmit at ffffffffae08416a #20 [ff2d20159b39fab8] dev_queue_xmit_accel at ffffffffae08438e #21 [ff2d20159b39fac8] macvlan_start_xmit at ffffffffc2fc18d9 [macvlan] #22 [ff2d20159b39faf0] dev_hard_start_xmit at ffffffffae083766 #23 [ff2d20159b39fb50] sch_direct_xmit at ffffffffae0e2564 #24 [ff2d20159b39fba0] __qdisc_run at ffffffffae0e294e #25 [ff2d20159b39fc18] __dev_queue_xmit at ffffffffae083c81 #26 [ff2d20159b39fc90] dev_queue_xmit at ffffffffae084370 #27 [ff2d20159b39fca0] tap_sendmsg at ffffffffc07206ed [tap] #28 [ff2d20159b39fd20] vhost_tx_batch at ffffffffc2fd6590 [vhost_net] #29 [ff2d20159b39fd68] handle_tx_copy at ffffffffc2fd70f3 [vhost_net] #30 [ff2d20159b39fe80] handle_tx at ffffffffc2fd7651 [vhost_net] #31 [ff2d20159b39feb0] handle_tx_kick at ffffffffc2fd76b5 [vhost_net] #32 [ff2d20159b39fec0] vhost_worker at ffffffffc12a5be8 [vhost] #33 [ff2d20159b39ff08] kthread at ffffffffad8dbfe5 #34 [ff2d20159b39ff50] ret_from_fork at ffffffffae400364 This change was discussed with Nvidia and they are in agreement. Orabug: 36879156 CVE: CVE-2024-41090 CVE: CVE-2024-41091 Fixes: e4cf27b ("net/mlx5e: Re-eanble client vlan TX acceleration") Reported-and-tested-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com> (cherry picked from commit 0dd4b99) Orabug: 36879126 CVE: CVE-2024-41090 CVE: CVE-2024-41091 Signed-off-by: Harshvardhan Jha <harshvardhan.j.jha@oracle.com> Reviewed-by: Vijayendra Suman <vijayendra.suman@oracle.com>
…ress ACPICA commit c14708336bd18552b28643575de7b5beb9b864e9 Before this change we see the following UBSAN stack trace in Fuchsia: #0 0x0000220c98288eba in acpi_rs_get_address_common(struct acpi_resource*, union aml_resource*) ../../third_party/acpica/source/components/resources/rsaddr.c:331 <platform-bus-x86.so>+0x8f6eba #1.2 0x000023625f46077f in ubsan_get_stack_trace() compiler-rt/lib/ubsan/ubsan_diag.cpp:41 <libclang_rt.asan.so>+0x3d77f #1.1 0x000023625f46077f in maybe_print_stack_trace() compiler-rt/lib/ubsan/ubsan_diag.cpp:51 <libclang_rt.asan.so>+0x3d77f #1 0x000023625f46077f in ~scoped_report() compiler-rt/lib/ubsan/ubsan_diag.cpp:387 <libclang_rt.asan.so>+0x3d77f #2 0x000023625f461385 in handletype_mismatch_impl() compiler-rt/lib/ubsan/ubsan_handlers.cpp:137 <libclang_rt.asan.so>+0x3e385 #3 0x000023625f460ead in compiler-rt/lib/ubsan/ubsan_handlers.cpp:142 <libclang_rt.asan.so>+0x3dead #4 0x0000220c98288eba in acpi_rs_get_address_common(struct acpi_resource*, union aml_resource*) ../../third_party/acpica/source/components/resources/rsaddr.c:331 <platform-bus-x86.so>+0x8f6eba #5 0x0000220c9828ea57 in acpi_rs_convert_aml_to_resource(struct acpi_resource*, union aml_resource*, struct acpi_rsconvert_info*) ../../third_party/acpica/source/components/resources/rsmisc.c:352 <platform-bus-x86.so>+0x8fca57 #6 0x0000220c9828992c in acpi_rs_convert_aml_to_resources(u8*, u32, u32, u8, void**) ../../third_party/acpica/source/components/resources/rslist.c:132 <platform-bus-x86.so>+0x8f792c #7 0x0000220c982d1cfc in acpi_ut_walk_aml_resources(struct acpi_walk_state*, u8*, acpi_size, acpi_walk_aml_callback, void**) ../../third_party/acpica/source/components/utilities/utresrc.c:234 <platform-bus-x86.so>+0x93fcfc #8 0x0000220c98281e46 in acpi_rs_create_resource_list(union acpi_operand_object*, struct acpi_buffer*) ../../third_party/acpica/source/components/resources/rscreate.c:199 <platform-bus-x86.so>+0x8efe46 #9 0x0000220c98293b51 in acpi_rs_get_method_data(acpi_handle, const char*, struct acpi_buffer*) ../../third_party/acpica/source/components/resources/rsutils.c:770 <platform-bus-x86.so>+0x901b51 #10 0x0000220c9829438d in acpi_walk_resources(acpi_handle, char*, acpi_walk_resource_callback, void*) ../../third_party/acpica/source/components/resources/rsxface.c:731 <platform-bus-x86.so>+0x90238d #11 0x0000220c97db272b in acpi::acpi_impl::walk_resources(acpi::acpi_impl*, acpi_handle, const char*, acpi::Acpi::resources_callable) ../../src/devices/board/lib/acpi/acpi-impl.cc:41 <platform-bus-x86.so>+0x42072b #12 0x0000220c97dcec59 in acpi::device_builder::gather_resources(acpi::device_builder*, acpi::Acpi*, fidl::any_arena&, acpi::Manager*, acpi::device_builder::gather_resources_callback) ../../src/devices/board/lib/acpi/device-builder.cc:52 <platform-bus-x86.so>+0x43cc59 #13 0x0000220c97f94a3f in acpi::Manager::configure_discovered_devices(acpi::Manager*) ../../src/devices/board/lib/acpi/manager.cc:75 <platform-bus-x86.so>+0x602a3f #14 0x0000220c97c642c7 in publish_acpi_devices(acpi::Manager*, zx_device_t*, zx_device_t*) ../../src/devices/board/drivers/x86/acpi-nswalk.cc:102 <platform-bus-x86.so>+0x2d22c7 #15 0x0000220c97caf3e6 in x86::X86::do_init(x86::X86*) ../../src/devices/board/drivers/x86/x86.cc:65 <platform-bus-x86.so>+0x31d3e6 #16 0x0000220c97cd72ae in λ(x86::X86::ddk_init::(anon class)*) ../../src/devices/board/drivers/x86/x86.cc:82 <platform-bus-x86.so>+0x3452ae #17 0x0000220c97cd7223 in fit::internal::target<(lambda at../../src/devices/board/drivers/x86/x86.cc:81:19), false, false, void>::invoke(void*) ../../sdk/lib/fit/include/lib/fit/internal/function.h:181 <platform-bus-x86.so>+0x345223 #18 0x0000220c97f48eb0 in fit::internal::function_base<16UL, false, void()>::invoke(const fit::internal::function_base<16UL, false, void ()>*) ../../sdk/lib/fit/include/lib/fit/internal/function.h:505 <platform-bus-x86.so>+0x5b6eb0 #19 0x0000220c97f48d2a in fit::function_impl<16UL, false, void()>::operator()(const fit::function_impl<16UL, false, void ()>*) ../../sdk/lib/fit/include/lib/fit/function.h:300 <platform-bus-x86.so>+0x5b6d2a #20 0x0000220c982f9245 in async::internal::retained_task::Handler(async_dispatcher_t*, async_task_t*, zx_status_t) ../../zircon/system/ulib/async/task.cc:25 <platform-bus-x86.so>+0x967245 #21 0x000022e2aa1cd91e in λ(const driver_runtime::Dispatcher::post_task::(anon class)*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, zx_status_t) ../../src/devices/bin/driver_runtime/dispatcher.cc:715 <libdriver_runtime.so>+0xed91e #22 0x000022e2aa1cd621 in fit::internal::target<(lambda at../../src/devices/bin/driver_runtime/dispatcher.cc:714:7), true, false, void, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request>>, int>::invoke(void*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int) ../../sdk/lib/fit/include/lib/fit/internal/function.h:128 <libdriver_runtime.so>+0xed621 #23 0x000022e2aa1a8482 in fit::internal::function_base<24UL, true, void(std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request>>, int)>::invoke(const fit::internal::function_base<24UL, true, void (std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int)>*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int) ../../sdk/lib/fit/include/lib/fit/internal/function.h:505 <libdriver_runtime.so>+0xc8482 #24 0x000022e2aa1a80f8 in fit::callback_impl<24UL, true, void(std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request>>, int)>::operator()(fit::callback_impl<24UL, true, void (std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int)>*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int) ../../sdk/lib/fit/include/lib/fit/function.h:451 <libdriver_runtime.so>+0xc80f8 #25 0x000022e2aa17fc76 in driver_runtime::callback_request::Call(driver_runtime::callback_request*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, zx_status_t) ../../src/devices/bin/driver_runtime/callback_request.h:67 <libdriver_runtime.so>+0x9fc76 #26 0x000022e2aa18c7ef in driver_runtime::Dispatcher::dispatch_callback(driver_runtime::Dispatcher*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >) ../../src/devices/bin/driver_runtime/dispatcher.cc:1093 <libdriver_runtime.so>+0xac7ef #27 0x000022e2aa18fd67 in driver_runtime::Dispatcher::dispatch_callbacks(driver_runtime::Dispatcher*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../src/devices/bin/driver_runtime/dispatcher.cc:1169 <libdriver_runtime.so>+0xafd67 #28 0x000022e2aa1bc9a2 in λ(const driver_runtime::Dispatcher::create_with_adder::(anon class)*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../src/devices/bin/driver_runtime/dispatcher.cc:338 <libdriver_runtime.so>+0xdc9a2 #29 0x000022e2aa1bc6d2 in fit::internal::target<(lambda at../../src/devices/bin/driver_runtime/dispatcher.cc:337:7), true, false, void, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter>>, fbl::ref_ptr<driver_runtime::Dispatcher>>::invoke(void*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../sdk/lib/fit/include/lib/fit/internal/function.h:128 <libdriver_runtime.so>+0xdc6d2 #30 0x000022e2aa1aa1e5 in fit::internal::function_base<8UL, true, void(std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter>>, fbl::ref_ptr<driver_runtime::Dispatcher>)>::invoke(const fit::internal::function_base<8UL, true, void (std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>)>*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../sdk/lib/fit/include/lib/fit/internal/function.h:505 <libdriver_runtime.so>+0xca1e5 #31 0x000022e2aa1a9e32 in fit::function_impl<8UL, true, void(std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter>>, fbl::ref_ptr<driver_runtime::Dispatcher>)>::operator()(const fit::function_impl<8UL, true, void (std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>)>*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../sdk/lib/fit/include/lib/fit/function.h:300 <libdriver_runtime.so>+0xc9e32 #32 0x000022e2aa193444 in driver_runtime::Dispatcher::event_waiter::invoke_callback(driver_runtime::Dispatcher::event_waiter*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../src/devices/bin/driver_runtime/dispatcher.h:299 <libdriver_runtime.so>+0xb3444 #33 0x000022e2aa192feb in driver_runtime::Dispatcher::event_waiter::handle_event(std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, async_dispatcher_t*, async::wait_base*, zx_status_t, zx_packet_signal_t const*) ../../src/devices/bin/driver_runtime/dispatcher.cc:1259 <libdriver_runtime.so>+0xb2feb #34 0x000022e2aa1bcf74 in async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>::handle_event(async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>*, async_dispatcher_t*, async::wait_base*, zx_status_t, zx_packet_signal_t const*) ../../src/devices/bin/driver_runtime/async_loop_owned_event_handler.h:59 <libdriver_runtime.so>+0xdcf74 #35 0x000022e2aa1bd1cb in async::wait_method<async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>, &async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>::handle_event>::call_handler(async_dispatcher_t*, async_wait_t*, zx_status_t, zx_packet_signal_t const*) ../../zircon/system/ulib/async/include/lib/async/cpp/wait.h:201 <libdriver_runtime.so>+0xdd1cb #36 0x000022e2aa2303a9 in async_loop_dispatch_wait(async_loop_t*, async_wait_t*, zx_status_t, zx_packet_signal_t const*) ../../zircon/system/ulib/async-loop/loop.c:381 <libdriver_runtime.so>+0x1503a9 #37 0x000022e2aa229a82 in async_loop_run_once(async_loop_t*, zx_time_t) ../../zircon/system/ulib/async-loop/loop.c:330 <libdriver_runtime.so>+0x149a82 #38 0x000022e2aa229102 in async_loop_run(async_loop_t*, zx_time_t, _Bool) ../../zircon/system/ulib/async-loop/loop.c:288 <libdriver_runtime.so>+0x149102 #39 0x000022e2aa22aeb7 in async_loop_run_thread(void*) ../../zircon/system/ulib/async-loop/loop.c:840 <libdriver_runtime.so>+0x14aeb7 #40 0x000041a874980f1c in start_c11(void*) ../../zircon/third_party/ulib/musl/pthread/pthread_create.c:55 <libc.so>+0xd7f1c #41 0x000041a874aabe8d in thread_trampoline(uintptr_t, uintptr_t) ../../zircon/system/ulib/runtime/thread.cc:100 <libc.so>+0x202e8d Link: acpica/acpica@c1470833 Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> (cherry picked from commit 24d9609) Orabug: 37311726 Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Reviewed-by: Thomas Tai <thomas.tai@oracle.com> Signed-off-by: Vijayendra Suman <vijayendra.suman@oracle.com>
The cited commit adds a compeletion to remove dependency on rtnl lock. But it causes a deadlock for multiple encapsulations: crash> bt ffff8aece8a64000 PID: 1514557 TASK: ffff8aece8a64000 CPU: 3 COMMAND: "tc" #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418 #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898 #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8 #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core] #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core] #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core] #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core] #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core] #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core] #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core] #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower] #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower] #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c crash> bt 0xffff8aeb07544000 PID: 1110766 TASK: ffff8aeb07544000 CPU: 0 COMMAND: "kworker/u20:9" #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418 #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88 #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core] #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core] #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core] #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012 #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f After the first encap is attached, flow will be added to encap entry's flows list. If neigh update is running at this time, the following encaps of the flow can't hold the encap_tbl_lock and sleep. If neigh update thread is waiting for that flow's init_done, deadlock happens. Fix it by holding lock outside of the for loop. If neigh update is running, prevent encap flows from offloading. Since the lock is held outside of the for loop, concurrent creation of encap entries is not allowed. So remove unnecessary wait_for_completion call for res_ready. Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Orabug: 35383105 (cherry picked from commit 37c3b9f) cherry-pick-repo=kernel/git/torvalds/linux.git unmodified-from-upstream: 37c3b9f Signed-off-by: Mikhael Goikhman <migo@nvidia.com> Signed-off-by: Qing Huang <qing.huang@oracle.com> Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
The cited commit adds a compeletion to remove dependency on rtnl lock. But it causes a deadlock for multiple encapsulations: crash> bt ffff8aece8a64000 PID: 1514557 TASK: ffff8aece8a64000 CPU: 3 COMMAND: "tc" #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418 #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898 #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8 #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core] #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core] #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core] #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core] #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core] #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core] #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core] #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower] #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower] #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c crash> bt 0xffff8aeb07544000 PID: 1110766 TASK: ffff8aeb07544000 CPU: 0 COMMAND: "kworker/u20:9" #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418 #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88 #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core] #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core] #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core] #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012 #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f After the first encap is attached, flow will be added to encap entry's flows list. If neigh update is running at this time, the following encaps of the flow can't hold the encap_tbl_lock and sleep. If neigh update thread is waiting for that flow's init_done, deadlock happens. Fix it by holding lock outside of the for loop. If neigh update is running, prevent encap flows from offloading. Since the lock is held outside of the for loop, concurrent creation of encap entries is not allowed. So remove unnecessary wait_for_completion call for res_ready. Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Orabug: 35383105 (cherry picked from commit 37c3b9f) cherry-pick-repo=kernel/git/torvalds/linux.git unmodified-from-upstream: 37c3b9f Signed-off-by: Mikhael Goikhman <migo@nvidia.com> Signed-off-by: Qing Huang <qing.huang@oracle.com> Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
The cited commit adds a compeletion to remove dependency on rtnl lock. But it causes a deadlock for multiple encapsulations: crash> bt ffff8aece8a64000 PID: 1514557 TASK: ffff8aece8a64000 CPU: 3 COMMAND: "tc" #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418 #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898 #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8 #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core] #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core] #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core] #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core] #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core] #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core] #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core] #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower] #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower] #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c crash> bt 0xffff8aeb07544000 PID: 1110766 TASK: ffff8aeb07544000 CPU: 0 COMMAND: "kworker/u20:9" #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418 #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88 #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core] #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core] #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core] #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012 #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f After the first encap is attached, flow will be added to encap entry's flows list. If neigh update is running at this time, the following encaps of the flow can't hold the encap_tbl_lock and sleep. If neigh update thread is waiting for that flow's init_done, deadlock happens. Fix it by holding lock outside of the for loop. If neigh update is running, prevent encap flows from offloading. Since the lock is held outside of the for loop, concurrent creation of encap entries is not allowed. So remove unnecessary wait_for_completion call for res_ready. Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Orabug: 35383105 (cherry picked from commit 37c3b9f) cherry-pick-repo=kernel/git/torvalds/linux.git unmodified-from-upstream: 37c3b9f Signed-off-by: Mikhael Goikhman <migo@nvidia.com> Signed-off-by: Qing Huang <qing.huang@oracle.com> Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
The cited commit adds a compeletion to remove dependency on rtnl lock. But it causes a deadlock for multiple encapsulations: crash> bt ffff8aece8a64000 PID: 1514557 TASK: ffff8aece8a64000 CPU: 3 COMMAND: "tc" #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418 #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898 #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8 #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core] #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core] #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core] #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core] #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core] #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core] #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core] #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower] #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower] #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c crash> bt 0xffff8aeb07544000 PID: 1110766 TASK: ffff8aeb07544000 CPU: 0 COMMAND: "kworker/u20:9" #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418 #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88 #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core] #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core] #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core] #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012 #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f After the first encap is attached, flow will be added to encap entry's flows list. If neigh update is running at this time, the following encaps of the flow can't hold the encap_tbl_lock and sleep. If neigh update thread is waiting for that flow's init_done, deadlock happens. Fix it by holding lock outside of the for loop. If neigh update is running, prevent encap flows from offloading. Since the lock is held outside of the for loop, concurrent creation of encap entries is not allowed. So remove unnecessary wait_for_completion call for res_ready. Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Orabug: 35383105 (cherry picked from commit 37c3b9f) cherry-pick-repo=kernel/git/torvalds/linux.git unmodified-from-upstream: 37c3b9f Signed-off-by: Mikhael Goikhman <migo@nvidia.com> Signed-off-by: Qing Huang <qing.huang@oracle.com> Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
The cited commit adds a compeletion to remove dependency on rtnl lock. But it causes a deadlock for multiple encapsulations: crash> bt ffff8aece8a64000 PID: 1514557 TASK: ffff8aece8a64000 CPU: 3 COMMAND: "tc" #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418 #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898 #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8 #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core] #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core] #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core] #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core] #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core] #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core] #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core] #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower] #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower] #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c crash> bt 0xffff8aeb07544000 PID: 1110766 TASK: ffff8aeb07544000 CPU: 0 COMMAND: "kworker/u20:9" #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418 #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88 #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core] #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core] #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core] #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012 #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f After the first encap is attached, flow will be added to encap entry's flows list. If neigh update is running at this time, the following encaps of the flow can't hold the encap_tbl_lock and sleep. If neigh update thread is waiting for that flow's init_done, deadlock happens. Fix it by holding lock outside of the for loop. If neigh update is running, prevent encap flows from offloading. Since the lock is held outside of the for loop, concurrent creation of encap entries is not allowed. So remove unnecessary wait_for_completion call for res_ready. Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Orabug: 35383105 (cherry picked from commit 37c3b9f) cherry-pick-repo=kernel/git/torvalds/linux.git unmodified-from-upstream: 37c3b9f Signed-off-by: Mikhael Goikhman <migo@nvidia.com> Signed-off-by: Qing Huang <qing.huang@oracle.com> Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
The cited commit adds a compeletion to remove dependency on rtnl lock. But it causes a deadlock for multiple encapsulations: crash> bt ffff8aece8a64000 PID: 1514557 TASK: ffff8aece8a64000 CPU: 3 COMMAND: "tc" #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418 #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898 #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8 #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core] #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core] #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core] #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core] #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core] #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core] #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core] #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower] #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower] #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c crash> bt 0xffff8aeb07544000 PID: 1110766 TASK: ffff8aeb07544000 CPU: 0 COMMAND: "kworker/u20:9" #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418 #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88 #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core] #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core] #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core] #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012 #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f After the first encap is attached, flow will be added to encap entry's flows list. If neigh update is running at this time, the following encaps of the flow can't hold the encap_tbl_lock and sleep. If neigh update thread is waiting for that flow's init_done, deadlock happens. Fix it by holding lock outside of the for loop. If neigh update is running, prevent encap flows from offloading. Since the lock is held outside of the for loop, concurrent creation of encap entries is not allowed. So remove unnecessary wait_for_completion call for res_ready. Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Orabug: 35383105 (cherry picked from commit 37c3b9f) cherry-pick-repo=kernel/git/torvalds/linux.git unmodified-from-upstream: 37c3b9f Signed-off-by: Mikhael Goikhman <migo@nvidia.com> Signed-off-by: Qing Huang <qing.huang@oracle.com> Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
The cited commit adds a compeletion to remove dependency on rtnl lock. But it causes a deadlock for multiple encapsulations: crash> bt ffff8aece8a64000 PID: 1514557 TASK: ffff8aece8a64000 CPU: 3 COMMAND: "tc" #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418 #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898 #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8 #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core] #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core] #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core] #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core] #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core] #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core] #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core] #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower] #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower] #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c crash> bt 0xffff8aeb07544000 PID: 1110766 TASK: ffff8aeb07544000 CPU: 0 COMMAND: "kworker/u20:9" #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418 #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88 #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core] #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core] #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core] #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012 #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f After the first encap is attached, flow will be added to encap entry's flows list. If neigh update is running at this time, the following encaps of the flow can't hold the encap_tbl_lock and sleep. If neigh update thread is waiting for that flow's init_done, deadlock happens. Fix it by holding lock outside of the for loop. If neigh update is running, prevent encap flows from offloading. Since the lock is held outside of the for loop, concurrent creation of encap entries is not allowed. So remove unnecessary wait_for_completion call for res_ready. Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Orabug: 35383105 (cherry picked from commit 37c3b9f) cherry-pick-repo=kernel/git/torvalds/linux.git unmodified-from-upstream: 37c3b9f Signed-off-by: Mikhael Goikhman <migo@nvidia.com> Signed-off-by: Qing Huang <qing.huang@oracle.com> Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Add a check to mlx5e_xmit() for shorter frames. A corrupted/malformed packet, with shorter length can eventually cause system panic further down in the code path. Avoid it by validating the length and dropping it at the earliest. Following is seen in our env with shorter skb->len crash> bt PID: 76981 TASK: ff19828cfe508000 CPU: 106 COMMAND: "vhost-76942" #0 [ff2d20159b39f2c8] machine_kexec at ffffffffad884801 #1 [ff2d20159b39f328] __crash_kexec at ffffffffad976142 #2 [ff2d20159b39f3f8] panic at ffffffffad8b3640 #3 [ff2d20159b39f4a0] no_context at ffffffffad8954e1 #4 [ff2d20159b39f518] __bad_area_nosemaphore at ffffffffad8958de #5 [ff2d20159b39f578] bad_area_nosemaphore at ffffffffad895a96 #6 [ff2d20159b39f588] do_kern_addr_fault at ffffffffad89688e #7 [ff2d20159b39f5b0] __do_page_fault at ffffffffad896b30 #8 [ff2d20159b39f618] do_page_fault at ffffffffad896db6 #9 [ff2d20159b39f650] page_fault at ffffffffae402acd [exception RIP: memcpy_erms+6] RIP: ffffffffae261ab6 RSP: ff2d20159b39f700 RFLAGS: 00010293 RAX: ff198291741ecf2e RBX: ff19828e70d6a100 RCX: fffffffffea1af2b RDX: fffffffffffffffd RSI: ff19828eba6d7e5e RDI: ff198291757d2000 RBP: ff2d20159b39f760 R8: ff198291741ecf00 R9: 000000000000037c R10: 000000000000003c R11: ff19828ffe953940 R12: ff198291741ecf20 R13: ff198267dcb1b600 R14: ff19828eeebb09c0 R15: ff198291741ecf00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ff2d20159b39f700] mlx5e_sq_xmit_wqe at ffffffffc05c162e [mlx5_core] #11 [ff2d20159b39f768] mlx5e_xmit at ffffffffc05c1ca3 [mlx5_core] #12 [ff2d20159b39f800] dev_hard_start_xmit at ffffffffae083766 #13 [ff2d20159b39f860] sch_direct_xmit at ffffffffae0e2564 #14 [ff2d20159b39f8b0] __qdisc_run at ffffffffae0e294e #15 [ff2d20159b39f928] __dev_queue_xmit at ffffffffae083eee #16 [ff2d20159b39f9a8] dev_queue_xmit at ffffffffae084370 #17 [ff2d20159b39f9b8] vlan_dev_hard_start_xmit at ffffffffc2fb6fec [8021q] #18 [ff2d20159b39f9d8] dev_hard_start_xmit at ffffffffae083766 #19 [ff2d20159b39fa38] __dev_queue_xmit at ffffffffae08416a #20 [ff2d20159b39fab8] dev_queue_xmit_accel at ffffffffae08438e #21 [ff2d20159b39fac8] macvlan_start_xmit at ffffffffc2fc18d9 [macvlan] #22 [ff2d20159b39faf0] dev_hard_start_xmit at ffffffffae083766 #23 [ff2d20159b39fb50] sch_direct_xmit at ffffffffae0e2564 #24 [ff2d20159b39fba0] __qdisc_run at ffffffffae0e294e #25 [ff2d20159b39fc18] __dev_queue_xmit at ffffffffae083c81 #26 [ff2d20159b39fc90] dev_queue_xmit at ffffffffae084370 #27 [ff2d20159b39fca0] tap_sendmsg at ffffffffc07206ed [tap] #28 [ff2d20159b39fd20] vhost_tx_batch at ffffffffc2fd6590 [vhost_net] #29 [ff2d20159b39fd68] handle_tx_copy at ffffffffc2fd70f3 [vhost_net] #30 [ff2d20159b39fe80] handle_tx at ffffffffc2fd7651 [vhost_net] #31 [ff2d20159b39feb0] handle_tx_kick at ffffffffc2fd76b5 [vhost_net] #32 [ff2d20159b39fec0] vhost_worker at ffffffffc12a5be8 [vhost] #33 [ff2d20159b39ff08] kthread at ffffffffad8dbfe5 #34 [ff2d20159b39ff50] ret_from_fork at ffffffffae400364 This change was discussed with Nvidia and they are in agreement. Orabug: 36660755 Fixes: e4cf27b ("net/mlx5e: Re-eanble client vlan TX acceleration") Reported-and-tested-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Add a check to mlx5e_xmit() for shorter frames. A corrupted/malformed packet, with shorter length can eventually cause system panic further down in the code path. Avoid it by validating the length and dropping it at the earliest. Following is seen in our env with shorter skb->len crash> bt PID: 76981 TASK: ff19828cfe508000 CPU: 106 COMMAND: "vhost-76942" #0 [ff2d20159b39f2c8] machine_kexec at ffffffffad884801 #1 [ff2d20159b39f328] __crash_kexec at ffffffffad976142 #2 [ff2d20159b39f3f8] panic at ffffffffad8b3640 #3 [ff2d20159b39f4a0] no_context at ffffffffad8954e1 #4 [ff2d20159b39f518] __bad_area_nosemaphore at ffffffffad8958de #5 [ff2d20159b39f578] bad_area_nosemaphore at ffffffffad895a96 #6 [ff2d20159b39f588] do_kern_addr_fault at ffffffffad89688e #7 [ff2d20159b39f5b0] __do_page_fault at ffffffffad896b30 #8 [ff2d20159b39f618] do_page_fault at ffffffffad896db6 #9 [ff2d20159b39f650] page_fault at ffffffffae402acd [exception RIP: memcpy_erms+6] RIP: ffffffffae261ab6 RSP: ff2d20159b39f700 RFLAGS: 00010293 RAX: ff198291741ecf2e RBX: ff19828e70d6a100 RCX: fffffffffea1af2b RDX: fffffffffffffffd RSI: ff19828eba6d7e5e RDI: ff198291757d2000 RBP: ff2d20159b39f760 R8: ff198291741ecf00 R9: 000000000000037c R10: 000000000000003c R11: ff19828ffe953940 R12: ff198291741ecf20 R13: ff198267dcb1b600 R14: ff19828eeebb09c0 R15: ff198291741ecf00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ff2d20159b39f700] mlx5e_sq_xmit_wqe at ffffffffc05c162e [mlx5_core] #11 [ff2d20159b39f768] mlx5e_xmit at ffffffffc05c1ca3 [mlx5_core] #12 [ff2d20159b39f800] dev_hard_start_xmit at ffffffffae083766 #13 [ff2d20159b39f860] sch_direct_xmit at ffffffffae0e2564 #14 [ff2d20159b39f8b0] __qdisc_run at ffffffffae0e294e #15 [ff2d20159b39f928] __dev_queue_xmit at ffffffffae083eee #16 [ff2d20159b39f9a8] dev_queue_xmit at ffffffffae084370 #17 [ff2d20159b39f9b8] vlan_dev_hard_start_xmit at ffffffffc2fb6fec [8021q] #18 [ff2d20159b39f9d8] dev_hard_start_xmit at ffffffffae083766 #19 [ff2d20159b39fa38] __dev_queue_xmit at ffffffffae08416a #20 [ff2d20159b39fab8] dev_queue_xmit_accel at ffffffffae08438e #21 [ff2d20159b39fac8] macvlan_start_xmit at ffffffffc2fc18d9 [macvlan] #22 [ff2d20159b39faf0] dev_hard_start_xmit at ffffffffae083766 #23 [ff2d20159b39fb50] sch_direct_xmit at ffffffffae0e2564 #24 [ff2d20159b39fba0] __qdisc_run at ffffffffae0e294e #25 [ff2d20159b39fc18] __dev_queue_xmit at ffffffffae083c81 #26 [ff2d20159b39fc90] dev_queue_xmit at ffffffffae084370 #27 [ff2d20159b39fca0] tap_sendmsg at ffffffffc07206ed [tap] #28 [ff2d20159b39fd20] vhost_tx_batch at ffffffffc2fd6590 [vhost_net] #29 [ff2d20159b39fd68] handle_tx_copy at ffffffffc2fd70f3 [vhost_net] #30 [ff2d20159b39fe80] handle_tx at ffffffffc2fd7651 [vhost_net] #31 [ff2d20159b39feb0] handle_tx_kick at ffffffffc2fd76b5 [vhost_net] #32 [ff2d20159b39fec0] vhost_worker at ffffffffc12a5be8 [vhost] #33 [ff2d20159b39ff08] kthread at ffffffffad8dbfe5 #34 [ff2d20159b39ff50] ret_from_fork at ffffffffae400364 This change was discussed with Nvidia and they are in agreement. Orabug: 36879156 CVE: CVE-2024-41090 CVE: CVE-2024-41091 Fixes: e4cf27b ("net/mlx5e: Re-eanble client vlan TX acceleration") Reported-and-tested-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com> (cherry picked from commit 0dd4b99) Orabug: 36879126 CVE: CVE-2024-41090 CVE: CVE-2024-41091 Signed-off-by: Harshvardhan Jha <harshvardhan.j.jha@oracle.com> Reviewed-by: Vijayendra Suman <vijayendra.suman@oracle.com>
Add a check to mlx5e_xmit() for shorter frames. A corrupted/malformed packet, with shorter length can eventually cause system panic further down in the code path. Avoid it by validating the length and dropping it at the earliest. Following is seen in our env with shorter skb->len crash> bt PID: 76981 TASK: ff19828cfe508000 CPU: 106 COMMAND: "vhost-76942" #0 [ff2d20159b39f2c8] machine_kexec at ffffffffad884801 #1 [ff2d20159b39f328] __crash_kexec at ffffffffad976142 #2 [ff2d20159b39f3f8] panic at ffffffffad8b3640 #3 [ff2d20159b39f4a0] no_context at ffffffffad8954e1 #4 [ff2d20159b39f518] __bad_area_nosemaphore at ffffffffad8958de #5 [ff2d20159b39f578] bad_area_nosemaphore at ffffffffad895a96 #6 [ff2d20159b39f588] do_kern_addr_fault at ffffffffad89688e #7 [ff2d20159b39f5b0] __do_page_fault at ffffffffad896b30 #8 [ff2d20159b39f618] do_page_fault at ffffffffad896db6 #9 [ff2d20159b39f650] page_fault at ffffffffae402acd [exception RIP: memcpy_erms+6] RIP: ffffffffae261ab6 RSP: ff2d20159b39f700 RFLAGS: 00010293 RAX: ff198291741ecf2e RBX: ff19828e70d6a100 RCX: fffffffffea1af2b RDX: fffffffffffffffd RSI: ff19828eba6d7e5e RDI: ff198291757d2000 RBP: ff2d20159b39f760 R8: ff198291741ecf00 R9: 000000000000037c R10: 000000000000003c R11: ff19828ffe953940 R12: ff198291741ecf20 R13: ff198267dcb1b600 R14: ff19828eeebb09c0 R15: ff198291741ecf00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ff2d20159b39f700] mlx5e_sq_xmit_wqe at ffffffffc05c162e [mlx5_core] #11 [ff2d20159b39f768] mlx5e_xmit at ffffffffc05c1ca3 [mlx5_core] #12 [ff2d20159b39f800] dev_hard_start_xmit at ffffffffae083766 #13 [ff2d20159b39f860] sch_direct_xmit at ffffffffae0e2564 #14 [ff2d20159b39f8b0] __qdisc_run at ffffffffae0e294e #15 [ff2d20159b39f928] __dev_queue_xmit at ffffffffae083eee #16 [ff2d20159b39f9a8] dev_queue_xmit at ffffffffae084370 #17 [ff2d20159b39f9b8] vlan_dev_hard_start_xmit at ffffffffc2fb6fec [8021q] #18 [ff2d20159b39f9d8] dev_hard_start_xmit at ffffffffae083766 #19 [ff2d20159b39fa38] __dev_queue_xmit at ffffffffae08416a #20 [ff2d20159b39fab8] dev_queue_xmit_accel at ffffffffae08438e #21 [ff2d20159b39fac8] macvlan_start_xmit at ffffffffc2fc18d9 [macvlan] #22 [ff2d20159b39faf0] dev_hard_start_xmit at ffffffffae083766 #23 [ff2d20159b39fb50] sch_direct_xmit at ffffffffae0e2564 #24 [ff2d20159b39fba0] __qdisc_run at ffffffffae0e294e #25 [ff2d20159b39fc18] __dev_queue_xmit at ffffffffae083c81 #26 [ff2d20159b39fc90] dev_queue_xmit at ffffffffae084370 #27 [ff2d20159b39fca0] tap_sendmsg at ffffffffc07206ed [tap] #28 [ff2d20159b39fd20] vhost_tx_batch at ffffffffc2fd6590 [vhost_net] #29 [ff2d20159b39fd68] handle_tx_copy at ffffffffc2fd70f3 [vhost_net] #30 [ff2d20159b39fe80] handle_tx at ffffffffc2fd7651 [vhost_net] #31 [ff2d20159b39feb0] handle_tx_kick at ffffffffc2fd76b5 [vhost_net] #32 [ff2d20159b39fec0] vhost_worker at ffffffffc12a5be8 [vhost] #33 [ff2d20159b39ff08] kthread at ffffffffad8dbfe5 #34 [ff2d20159b39ff50] ret_from_fork at ffffffffae400364 This change was discussed with Nvidia and they are in agreement. Orabug: 36879156 CVE: CVE-2024-41090 CVE: CVE-2024-41091 Fixes: e4cf27b ("net/mlx5e: Re-eanble client vlan TX acceleration") Reported-and-tested-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com> (cherry picked from commit 0dd4b99) Orabug: 36879126 CVE: CVE-2024-41090 CVE: CVE-2024-41091 Signed-off-by: Harshvardhan Jha <harshvardhan.j.jha@oracle.com> Reviewed-by: Vijayendra Suman <vijayendra.suman@oracle.com>
Add a check to mlx5e_xmit() for shorter frames. A corrupted/malformed packet, with shorter length can eventually cause system panic further down in the code path. Avoid it by validating the length and dropping it at the earliest. Following is seen in our env with shorter skb->len crash> bt PID: 76981 TASK: ff19828cfe508000 CPU: 106 COMMAND: "vhost-76942" #0 [ff2d20159b39f2c8] machine_kexec at ffffffffad884801 #1 [ff2d20159b39f328] __crash_kexec at ffffffffad976142 #2 [ff2d20159b39f3f8] panic at ffffffffad8b3640 #3 [ff2d20159b39f4a0] no_context at ffffffffad8954e1 #4 [ff2d20159b39f518] __bad_area_nosemaphore at ffffffffad8958de #5 [ff2d20159b39f578] bad_area_nosemaphore at ffffffffad895a96 #6 [ff2d20159b39f588] do_kern_addr_fault at ffffffffad89688e #7 [ff2d20159b39f5b0] __do_page_fault at ffffffffad896b30 #8 [ff2d20159b39f618] do_page_fault at ffffffffad896db6 #9 [ff2d20159b39f650] page_fault at ffffffffae402acd [exception RIP: memcpy_erms+6] RIP: ffffffffae261ab6 RSP: ff2d20159b39f700 RFLAGS: 00010293 RAX: ff198291741ecf2e RBX: ff19828e70d6a100 RCX: fffffffffea1af2b RDX: fffffffffffffffd RSI: ff19828eba6d7e5e RDI: ff198291757d2000 RBP: ff2d20159b39f760 R8: ff198291741ecf00 R9: 000000000000037c R10: 000000000000003c R11: ff19828ffe953940 R12: ff198291741ecf20 R13: ff198267dcb1b600 R14: ff19828eeebb09c0 R15: ff198291741ecf00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ff2d20159b39f700] mlx5e_sq_xmit_wqe at ffffffffc05c162e [mlx5_core] #11 [ff2d20159b39f768] mlx5e_xmit at ffffffffc05c1ca3 [mlx5_core] #12 [ff2d20159b39f800] dev_hard_start_xmit at ffffffffae083766 #13 [ff2d20159b39f860] sch_direct_xmit at ffffffffae0e2564 #14 [ff2d20159b39f8b0] __qdisc_run at ffffffffae0e294e #15 [ff2d20159b39f928] __dev_queue_xmit at ffffffffae083eee #16 [ff2d20159b39f9a8] dev_queue_xmit at ffffffffae084370 #17 [ff2d20159b39f9b8] vlan_dev_hard_start_xmit at ffffffffc2fb6fec [8021q] #18 [ff2d20159b39f9d8] dev_hard_start_xmit at ffffffffae083766 #19 [ff2d20159b39fa38] __dev_queue_xmit at ffffffffae08416a #20 [ff2d20159b39fab8] dev_queue_xmit_accel at ffffffffae08438e #21 [ff2d20159b39fac8] macvlan_start_xmit at ffffffffc2fc18d9 [macvlan] #22 [ff2d20159b39faf0] dev_hard_start_xmit at ffffffffae083766 #23 [ff2d20159b39fb50] sch_direct_xmit at ffffffffae0e2564 #24 [ff2d20159b39fba0] __qdisc_run at ffffffffae0e294e #25 [ff2d20159b39fc18] __dev_queue_xmit at ffffffffae083c81 #26 [ff2d20159b39fc90] dev_queue_xmit at ffffffffae084370 #27 [ff2d20159b39fca0] tap_sendmsg at ffffffffc07206ed [tap] #28 [ff2d20159b39fd20] vhost_tx_batch at ffffffffc2fd6590 [vhost_net] #29 [ff2d20159b39fd68] handle_tx_copy at ffffffffc2fd70f3 [vhost_net] #30 [ff2d20159b39fe80] handle_tx at ffffffffc2fd7651 [vhost_net] #31 [ff2d20159b39feb0] handle_tx_kick at ffffffffc2fd76b5 [vhost_net] #32 [ff2d20159b39fec0] vhost_worker at ffffffffc12a5be8 [vhost] #33 [ff2d20159b39ff08] kthread at ffffffffad8dbfe5 #34 [ff2d20159b39ff50] ret_from_fork at ffffffffae400364 This change was discussed with Nvidia and they are in agreement. Orabug: 36879156 CVE: CVE-2024-41090 CVE: CVE-2024-41091 Fixes: e4cf27b ("net/mlx5e: Re-eanble client vlan TX acceleration") Reported-and-tested-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com> (cherry picked from commit 0dd4b99) Orabug: 36879126 CVE: CVE-2024-41090 CVE: CVE-2024-41091 Signed-off-by: Harshvardhan Jha <harshvardhan.j.jha@oracle.com> Reviewed-by: Vijayendra Suman <vijayendra.suman@oracle.com>
Description:
Since the UEK7u2, SR-IOV is broken for Mellanox ConnectX4, if one or both ports are configured for ethernet mode.
The same works fine if the ports are configure for IB mode.
Additionally, everything is fine on UEK7u1 regardless of the configuration.
Diagnostic info:
Port 0 is in IB mode
Port 1 is in ETH mode
Output of lspci :
lspci |grep Mellanox
0b:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
0b:00.1 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
0b:00.2 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0b:00.3 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0b:00.4 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0b:00.5 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0b:00.6 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0b:00.7 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
Output of from dmesg:
dmesg |grep mlx
[ 1.349024] mlx5_core 0000:0b:00.0: firmware version: 12.28.2006
[ 1.349050] mlx5_core 0000:0b:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
[ 4.071873] mlx5_core 0000:0b:00.0: Port module event: module 0, Cable plugged
[ 4.265254] mlx5_core 0000:0b:00.1: firmware version: 12.28.2006
[ 4.265304] mlx5_core 0000:0b:00.1: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
[ 4.683438] mlx5_core 0000:0b:00.1: E-Switch: Total vports 10, per vport: max uc(1024) max mc(16384)
[ 4.687324] mlx5_core 0000:0b:00.1: Port module event: module 1, Cable plugged
[ 4.932149] mlx5_core 0000:0b:00.1: Supported tc offload range - chains: 4294967294, prios: 4294967295
[ 4.941115] mlx5_core 0000:0b:00.1: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) RxCqeCmprss(0 basic)
[ 4.943224] mlx5_core 0000:0b:00.1 enp11s0f1np1: renamed from eth0
[ 5.655608] mlx5_core 0000:0b:00.1: Successfully registered panic handler for port 1
[ 5.773746] mlx5_core 0000:0b:00.1: mlx5_cmd_out_err:803:(pid 814): QUERY_HCA_CAP(0x100) op_mod(0x40) failed, status bad parameter(0x3), syndrome (0x5add95), err(-22)
[ 5.787044] mlx5_core 0000:0b:00.1: mlx5_device_enable_sriov:82:(pid 814): failed to enable eswitch SRIOV (-22)
[ 5.787047] mlx5_core 0000:0b:00.1: mlx5_sriov_enable:168:(pid 814): mlx5_device_enable_sriov failed : -22
[ 5.787222] mlx5_core 0000:0b:00.1 mlxen0: renamed from enp11s0f1np1
[ 6.213460] mlx5_core 0000:0b:00.0 mlxib0: renamed from ib0
[ 7.054459] mlx5_core 0000:0b:00.1 mlxen0: Link up
[ 7.057046] 8021q: adding VLAN 0 to HW filter on device mlxen0
[ 8.055388] IPv6: ADDRCONF(NETDEV_CHANGE): mlxen0: link becomes ready
[ 8.284101] IPv6: ADDRCONF(NETDEV_CHANGE): mlxib0: link becomes ready
[ 8.557689] IPv6: ADDRCONF(NETDEV_CHANGE): mlxib0: link becomes ready
[ 8.580025] br1: port 1(mlxen0) entered blocking state
[ 8.580028] br1: port 1(mlxen0) entered disabled state
[ 8.580067] device mlxen0 entered promiscuous mode
[ 8.580957] br1: port 1(mlxen0) entered blocking state
[ 8.580960] br1: port 1(mlxen0) entered listening state
[ 8.595823] mlx5_core 0000:0b:00.1: mlx5e_fs_set_rx_mode_work:843:(pid 156): S-tagged traffic will be dropped while C-tag vlan stripping is enabled
[ 10.635172] br1: port 1(mlxen0) entered learning state
[ 25.931048] br1: port 1(mlxen0) entered forwarding state
Manually trying to add VFs to the ETH port results with the following error:
echo 0 > /sys/class/net/mlxen0/device/sriov_numvfs
echo 7 > /sys/class/net/mlxen0/device/sriov_numvfs
-bash: echo: write error: Invalid argument
The same works just fine for the IB port:
echo 0 > /sys/class/net/mlxib0/device/sriov_numvfs
echo 7 > /sys/class/net/mlxib0/device/sriov_numvfs
The text was updated successfully, but these errors were encountered: