linux

mirror of synced 2025-03-06 20:59:54 +01:00

Author	SHA1	Message	Date
Ido Schimmel	332fdf951d	mlxsw: thermal: Fix out-of-bounds memory accesses Currently, mlxsw allows cooling states to be set above the maximum cooling state supported by the driver: # cat /sys/class/thermal/thermal_zone2/cdev0/type mlxsw_fan # cat /sys/class/thermal/thermal_zone2/cdev0/max_state 10 # echo 18 > /sys/class/thermal/thermal_zone2/cdev0/cur_state # echo $? 0 This results in out-of-bounds memory accesses when thermal state transition statistics are enabled (CONFIG_THERMAL_STATISTICS=y), as the transition table is accessed with a too large index (state) [1]. According to the thermal maintainer, it is the responsibility of the driver to reject such operations [2]. Therefore, return an error when the state to be set exceeds the maximum cooling state supported by the driver. To avoid dead code, as suggested by the thermal maintainer [3], partially revert commit `a421ce088a` ("mlxsw: core: Extend cooling device with cooling levels") that tried to interpret these invalid cooling states (above the maximum) in a special way. The cooling levels array is not removed in order to prevent the fans going below 20% PWM, which would cause them to get stuck at 0% PWM. [1] BUG: KASAN: slab-out-of-bounds in thermal_cooling_device_stats_update+0x271/0x290 Read of size 4 at addr ffff8881052f7bf8 by task kworker/0:0/5 CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.15.0-rc3-custom-45935-gce1adf704b14 #122 Hardware name: Mellanox Technologies Ltd. "MSN2410-CB2FO"/"SA000874", BIOS 4.6.5 03/08/2016 Workqueue: events_freezable_power_ thermal_zone_device_check Call Trace: dump_stack_lvl+0x8b/0xb3 print_address_description.constprop.0+0x1f/0x140 kasan_report.cold+0x7f/0x11b thermal_cooling_device_stats_update+0x271/0x290 __thermal_cdev_update+0x15e/0x4e0 thermal_cdev_update+0x9f/0xe0 step_wise_throttle+0x770/0xee0 thermal_zone_device_update+0x3f6/0xdf0 process_one_work+0xa42/0x1770 worker_thread+0x62f/0x13e0 kthread+0x3ee/0x4e0 ret_from_fork+0x1f/0x30 Allocated by task 1: kasan_save_stack+0x1b/0x40 __kasan_kmalloc+0x7c/0x90 thermal_cooling_device_setup_sysfs+0x153/0x2c0 __thermal_cooling_device_register.part.0+0x25b/0x9c0 thermal_cooling_device_register+0xb3/0x100 mlxsw_thermal_init+0x5c5/0x7e0 __mlxsw_core_bus_device_register+0xcb3/0x19c0 mlxsw_core_bus_device_register+0x56/0xb0 mlxsw_pci_probe+0x54f/0x710 local_pci_probe+0xc6/0x170 pci_device_probe+0x2b2/0x4d0 really_probe+0x293/0xd10 __driver_probe_device+0x2af/0x440 driver_probe_device+0x51/0x1e0 __driver_attach+0x21b/0x530 bus_for_each_dev+0x14c/0x1d0 bus_add_driver+0x3ac/0x650 driver_register+0x241/0x3d0 mlxsw_sp_module_init+0xa2/0x174 do_one_initcall+0xee/0x5f0 kernel_init_freeable+0x45a/0x4de kernel_init+0x1f/0x210 ret_from_fork+0x1f/0x30 The buggy address belongs to the object at ffff8881052f7800 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 1016 bytes inside of 1024-byte region [ffff8881052f7800, ffff8881052f7c00) The buggy address belongs to the page: page:0000000052355272 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1052f0 head:0000000052355272 order:3 compound_mapcount:0 compound_pincount:0 flags: 0x200000000010200(slab\|head\|node=0\|zone=2) raw: 0200000000010200 ffffea0005034800 0000000300000003 ffff888100041dc0 raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8881052f7a80: 00 00 00 00 00 00 04 fc fc fc fc fc fc fc fc fc ffff8881052f7b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff8881052f7b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff8881052f7c00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8881052f7c80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [2] https://lore.kernel.org/linux-pm/9aca37cb-1629-5c67-1895-1fdc45c0244e@linaro.org/ [3] https://lore.kernel.org/linux-pm/af9857f2-578e-de3a-e62b-6baff7e69fd4@linaro.org/ CC: Daniel Lezcano <daniel.lezcano@linaro.org> Fixes: `a50c1e3565` ("mlxsw: core: Implement thermal zone") Fixes: `a421ce088a` ("mlxsw: core: Extend cooling device with cooling levels") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Vadim Pasternak <vadimp@nvidia.com> Link: https://lore.kernel.org/r/20211012174955.472928-1-idosch@idosch.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-14 07:13:26 -07:00
Maxim Mikityanskiy	84c8a87402	net/mlx5e: Fix division by 0 in mlx5e_select_queue for representors Commit `846d6da1fc` ("net/mlx5e: Fix division by 0 in mlx5e_select_queue") makes mlx5e_build_nic_params assign a non-zero initial value to priv->num_tc_x_num_ch, so that mlx5e_select_queue doesn't fail with division by 0 if called before the first activation of channels. However, the initialization flow of representors doesn't call mlx5e_build_nic_params, so this bug can still happen with representors. This commit fixes the bug by adding the missing assignment to mlx5e_build_rep_params. Fixes: `846d6da1fc` ("net/mlx5e: Fix division by 0 in mlx5e_select_queue") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-10-12 13:52:03 -07:00
Aya Levin	0bc73ad46a	net/mlx5e: Mutually exclude RX-FCS and RX-port-timestamp Due to current HW arch limitations, RX-FCS (scattering FCS frame field to software) and RX-port-timestamp (improved timestamp accuracy on the receive side) can't work together. RX-port-timestamp is not controlled by the user and it is enabled by default when supported by the HW/FW. This patch sets RX-port-timestamp opposite to RX-FCS configuration. Fixes: `102722fc68` ("net/mlx5e: Add support for RXFCS feature flag") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-10-12 13:51:59 -07:00
Saeed Mahameed	b2107cdc43	net/mlx5e: Switchdev representors are not vlan challenged Before this patch, mlx5 representors advertised the NETIF_F_VLAN_CHALLENGED bit, this could lead to missing features when using reps with vxlan/bridge and maybe other virtual interfaces, when such interfaces inherit this bit and block vlan usage in their topology. Example: $ip link add dev bridge type bridge # add representor interface to the bridge $ip link set dev pf0hpf master $ip link add link bridge name vlan10 type vlan id 10 protocol 802.1q Error: 8021q: VLANs not supported on device. Reps are perfectly capable of handling vlan traffic, although they don't implement vlan_{add,kill}_vid ndos, hence, remove NETIF_F_VLAN_CHALLENGED advertisement. Fixes: `cb67b83292` ("net/mlx5e: Introduce SRIOV VF representors") Reported-by: Roopa Prabhu <roopa@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com>	2021-10-12 13:51:56 -07:00
Valentine Fatiev	94b960b9de	net/mlx5e: Fix memory leak in mlx5_core_destroy_cq() error path Prior to this patch in case mlx5_core_destroy_cq() failed it returns without completing all destroy operations and that leads to memory leak. Instead, complete the destroy flow before return error. Also move mlx5_debug_cq_remove() to the beginning of mlx5_core_destroy_cq() to be symmetrical with mlx5_core_create_cq(). kmemleak complains on: unreferenced object 0xc000000038625100 (size 64): comm "ethtool", pid 28301, jiffies 4298062946 (age 785.380s) hex dump (first 32 bytes): 60 01 48 94 00 00 00 c0 b8 05 34 c3 00 00 00 c0 `.H.......4..... 02 00 00 00 00 00 00 00 00 db 7d c1 00 00 00 c0 ..........}..... backtrace: [<000000009e8643cb>] add_res_tree+0xd0/0x270 [mlx5_core] [<00000000e7cb8e6c>] mlx5_debug_cq_add+0x5c/0xc0 [mlx5_core] [<000000002a12918f>] mlx5_core_create_cq+0x1d0/0x2d0 [mlx5_core] [<00000000cef0a696>] mlx5e_create_cq+0x210/0x3f0 [mlx5_core] [<000000009c642c26>] mlx5e_open_cq+0xb4/0x130 [mlx5_core] [<0000000058dfa578>] mlx5e_ptp_open+0x7f4/0xe10 [mlx5_core] [<0000000081839561>] mlx5e_open_channels+0x9cc/0x13e0 [mlx5_core] [<0000000009cf05d4>] mlx5e_switch_priv_channels+0xa4/0x230 [mlx5_core] [<0000000042bbedd8>] mlx5e_safe_switch_params+0x14c/0x300 [mlx5_core] [<0000000004bc9db8>] set_pflag_tx_port_ts+0x9c/0x160 [mlx5_core] [<00000000a0553443>] mlx5e_set_priv_flags+0xd0/0x1b0 [mlx5_core] [<00000000a8f3d84b>] ethnl_set_privflags+0x234/0x2d0 [<00000000fd27f27c>] genl_family_rcv_msg_doit+0x108/0x1d0 [<00000000f495e2bb>] genl_family_rcv_msg+0xe4/0x1f0 [<00000000646c5c2c>] genl_rcv_msg+0x78/0x120 [<00000000d53e384e>] netlink_rcv_skb+0x74/0x1a0 Fixes: `e126ba97db` ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Valentine Fatiev <valentinef@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-10-12 13:51:52 -07:00
Tariq Toukan	ca20dfda05	net/mlx5e: Allow only complete TXQs partition in MQPRIO channel mode Do not allow configurations of MQPRIO channel mode that do not fully define and utilize the channels txqs. Fixes: `ec60c4581b` ("net/mlx5e: Support MQPRIO channel mode") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-10-12 13:51:48 -07:00
Shay Drory	2266bb1e12	net/mlx5: Fix cleanup of bridge delayed work Currently, bridge cleanup is calling to cancel_delayed_work(). When this function is finished, there is a chance that the delayed work is still running. Also, the delayed work is queueing itself. As a result, we might execute the delayed work after the bridge cleanup have finished and hit a null-ptr oops[1]. Fix it by using cancel_delayed_work_sync(), which is waiting until the work is done and will cancel the queue work. [1] [ 8202.143043 ] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 8202.144438 ] #PF: supervisor write access in kernel mode [ 8202.145476 ] #PF: error_code(0x0002) - not-present page [ 8202.146520 ] PGD 0 P4D 0 [ 8202.147126 ] Oops: 0002 [#1] SMP NOPTI [ 8202.147899 ] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.14.0-rc6_for_upstream_min_debug_2021_08_25_16_06 #1 [ 8202.149741 ] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 8202.151908 ] RIP: 0010:_raw_spin_lock+0xc/0x20 [ 8202.156234 ] RSP: 0018:ffff88846f885ea0 EFLAGS: 00010046 [ 8202.157289 ] RAX: 0000000000000000 RBX: ffff88846f880000 RCX: 0000000000000000 [ 8202.158731 ] RDX: 0000000000000001 RSI: ffff8881004000c8 RDI: 0000000000000000 [ 8202.160177 ] RBP: ffff8881fe684978 R08: ffff888100140000 R09: ffffffff824455b8 [ 8202.161569 ] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 [ 8202.163004 ] R13: 0000000000000012 R14: 0000000000000200 R15: ffff88812992d000 [ 8202.164018 ] FS: 0000000000000000(0000) GS:ffff88846f880000(0000) knlGS:0000000000000000 [ 8202.164960 ] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8202.165634 ] CR2: 0000000000000000 CR3: 0000000108cac004 CR4: 0000000000370ea0 [ 8202.166450 ] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 8202.167807 ] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 8202.168852 ] Call Trace: [ 8202.169421 ] <IRQ> [ 8202.169792 ] __queue_work+0xf2/0x3d0 [ 8202.170481 ] ? queue_work_node+0x40/0x40 [ 8202.171270 ] call_timer_fn+0x2b/0x100 [ 8202.171932 ] __run_timers.part.0+0x152/0x220 [ 8202.172717 ] ? __hrtimer_run_queues+0x171/0x290 [ 8202.173526 ] ? kvm_clock_get_cycles+0xd/0x10 [ 8202.174232 ] ? ktime_get+0x35/0x90 [ 8202.174943 ] run_timer_softirq+0x26/0x50 [ 8202.175745 ] __do_softirq+0xc7/0x271 [ 8202.176373 ] irq_exit_rcu+0x93/0xb0 [ 8202.176983 ] sysvec_apic_timer_interrupt+0x72/0x90 [ 8202.177755 ] </IRQ> [ 8202.178245 ] asm_sysvec_apic_timer_interrupt+0x12/0x20 Fixes: `c636a0f0f3` ("net/mlx5: Bridge, dynamic entry ageing") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-10-12 13:51:45 -07:00
Aya Levin	3bf1742f3c	net/mlx5e: Mutually exclude setting of TX-port-TS and MQPRIO in channel mode TX-port-TS hijacks the PTP traffic to a specific HW TX-queue. This conflicts with MQPRIO in channel mode, which specifies explicitly which TC accepts the packet. This patch mutually excludes the above configuration. Fixes: `ec60c4581b` ("net/mlx5e: Support MQPRIO channel mode") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-09-30 14:07:57 -07:00
Lama Kayal	dd1979cf3c	net/mlx5e: Fix the presented RQ index in PTP stats PTP-RQ counters title format contains PTP-RQ identifier, which is mistakenly not passed to sprinft(). This leads to unexpected garbage values instead. This patch fixes it. Before applying the patch: ethtool -S eth3 \| grep ptp_rq ptp_rq15_packets: 0 ptp_rq8_bytes: 0 ptp_rq6_csum_complete: 0 ptp_rq14_csum_complete_tail: 0 ptp_rq3_csum_complete_tail_slow : 0 ptp_rq9_csum_unnecessary: 0 ptp_rq1_csum_unnecessary_inner: 0 ptp_rq7_csum_none: 0 ptp_rq10_xdp_drop: 0 ptp_rq9_xdp_redirect: 0 ptp_rq13_lro_packets: 0 ptp_rq12_lro_bytes: 0 ptp_rq10_ecn_mark: 0 ptp_rq9_removed_vlan_packets: 0 ptp_rq5_wqe_err: 0 ptp_rq8_mpwqe_filler_cqes: 0 ptp_rq2_mpwqe_filler_strides: 0 ptp_rq5_oversize_pkts_sw_drop: 0 ptp_rq6_buff_alloc_err: 0 ptp_rq15_cqe_compress_blks: 0 ptp_rq2_cqe_compress_pkts: 0 ptp_rq2_cache_reuse: 0 ptp_rq12_cache_full: 0 ptp_rq11_cache_empty: 256 ptp_rq12_cache_busy: 0 ptp_rq11_cache_waive: 0 ptp_rq12_congst_umr: 0 ptp_rq11_arfs_err: 0 ptp_rq9_recover: 0 After applying the patch: ethtool -S eth3 \| grep ptp_rq ptp_rq0_packets: 0 ptp_rq0_bytes: 0 ptp_rq0_csum_complete: 0 ptp_rq0_csum_complete_tail: 0 ptp_rq0_csum_complete_tail_slow : 0 ptp_rq0_csum_unnecessary: 0 ptp_rq0_csum_unnecessary_inner: 0 ptp_rq0_csum_none: 0 ptp_rq0_xdp_drop: 0 ptp_rq0_xdp_redirect: 0 ptp_rq0_lro_packets: 0 ptp_rq0_lro_bytes: 0 ptp_rq0_ecn_mark: 0 ptp_rq0_removed_vlan_packets: 0 ptp_rq0_wqe_err: 0 ptp_rq0_mpwqe_filler_cqes: 0 ptp_rq0_mpwqe_filler_strides: 0 ptp_rq0_oversize_pkts_sw_drop: 0 ptp_rq0_buff_alloc_err: 0 ptp_rq0_cqe_compress_blks: 0 ptp_rq0_cqe_compress_pkts: 0 ptp_rq0_cache_reuse: 0 ptp_rq0_cache_full: 0 ptp_rq0_cache_empty: 256 ptp_rq0_cache_busy: 0 ptp_rq0_cache_waive: 0 ptp_rq0_congst_umr: 0 ptp_rq0_arfs_err: 0 ptp_rq0_recover: 0 Fixes: `a28359e922` ("net/mlx5e: Add PTP-RX statistics") Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-09-30 14:07:57 -07:00
Shay Drory	f88c487634	net/mlx5: Fix setting number of EQs of SFs When setting number of completion EQs of the SF, consider number of online CPUs. Without this consideration, when number of online cpus are less than 8, unnecessary 8 completion EQs are allocated. Fixes: `c36326d38d` ("net/mlx5: Round-Robin EQs over IRQs") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-09-30 14:07:56 -07:00
Shay Drory	ac8b7d50ae	net/mlx5: Fix length of irq_index in chars The maximum irq_index can be 2047, This means irq_name should have 4 characters reserve for the irq_index. Hence, increase it to 4. Fixes: `3af26495a2` ("net/mlx5: Enlarge interrupt field in CREATE_EQ") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-09-30 14:07:56 -07:00
Aya Levin	99b9a678b2	net/mlx5: Avoid generating event after PPS out in Real time mode When in Real-time mode, HW clock is synced with the PTP daemon. Hence driver should not re-calibrate the next pulse (via MTPPSE repetitive events mechanism). This patch arms repetitive events only in free-running mode. Fixes: `432119de33` ("net/mlx5: Add cyc2time HW translation mode support") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-09-30 14:07:56 -07:00
Aya Levin	6472829470	net/mlx5: Force round second at 1PPS out start time Allow configuration of 1PPS start time only with time-stamp representing a round second. Prior to this patch driver allowed setting of a non-round-second which is not supported by the device. Avoid unexpected behavior by restricting start-time configuration to a round-second. Fixes: `4272f9b88d` ("net/mlx5e: Change 1PPS out scheme") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-09-30 14:07:55 -07:00
Moshe Shemesh	a586775f83	net/mlx5: E-Switch, Fix double allocation of acl flow counter Flow counter is allocated in eswitch legacy acl setting functions without checking if already allocated by previous setting. Add a check to avoid such double allocation. Fixes: `07bab95026` ("net/mlx5: E-Switch, Refactor eswitch ingress acl codes") Fixes: `ea651a86d4` ("net/mlx5: E-Switch, Refactor eswitch egress acl codes") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-09-30 14:07:55 -07:00
Tariq Toukan	7dbc849b2a	net/mlx5e: Improve MQPRIO resiliency * Add netdev->tc_to_txq rollback in case of failure in mlx5e_update_netdev_queues(). * Fix broken transition between the two modes: MQPRIO DCB mode with tc==8, and MQPRIO channel mode. * Disable MQPRIO channel mode if re-attaching with a different number of channels. * Improve code sharing. Fixes: `ec60c4581b` ("net/mlx5e: Support MQPRIO channel mode") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-09-30 14:07:55 -07:00
Tariq Toukan	9d758d4a3a	net/mlx5e: Keep the value for maximum number of channels in-sync The value for maximum number of channels is first calculated based on the netdev's profile and current function resources (specifically, number of MSIX vectors, which depends among other things on the number of online cores in the system). This value is then used to calculate the netdev's number of rxqs/txqs. Once created (by alloc_etherdev_mqs), the number of netdev's rxqs/txqs is constant and we must not exceed it. To achieve this, keep the maximum number of channels in sync upon any netdevice re-attach. Use mlx5e_get_max_num_channels() for calculating the number of netdev's rxqs/txqs. After netdev is created, use mlx5e_calc_max_nch() (which coinsiders core device resources, profile, and netdev) to init or update priv->max_nch. Before this patch, the value of priv->max_nch might get out of sync, mistakenly allowing accesses to out-of-bounds objects, which would crash the system. Track the number of channels stats structures used in a separate field, as they are persistent to suspend/resume operations. All the collected stats of every channel index that ever existed should be preserved. They are reset only when struct mlx5e_priv is, in mlx5e_priv_cleanup(), which is part of the profile changing flow. There is no point anymore in blocking a profile change due to max_nch mismatch in mlx5e_netdev_change_profile(). Remove the limitation. Fixes: `a1f240f180` ("net/mlx5e: Adjust to max number of channles when re-attaching") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-09-30 14:07:54 -07:00
Raed Salem	f9a10440f0	net/mlx5e: IPSEC RX, enable checksum complete Currently in Rx data path IPsec crypto offloaded packets uses csum_none flag, so checksum is handled by the stack, this naturally have some performance/cpu utilization impact on such flows. As Nvidia NIC starting from ConnectX6DX provides checksum complete value out of the box also for such flows there is no sense in taking csum_none path, furthermore the stack (xfrm) have the method to handle checksum complete corrections for such flows i.e. IPsec trailer removal and consequently checksum value adjustment. Because of the above and in addition the ConnectX6DX is the first HW which supports IPsec crypto offload then it is safe to report csum complete for IPsec offloaded traffic. Fixes: `b2ac7541e3` ("net/mlx5e: IPsec: Add Connect-X IPsec Rx data path offload") Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-09-30 14:07:54 -07:00
Aya Levin	fdbccea419	net/mlx4_en: Don't allow aRFS for encapsulated packets Driver doesn't support aRFS for encapsulated packets, return early error in such a case. Fixes: `1eb8c695bd` ("net/mlx4_en: Add accelerated RFS support") Signed-off-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-09-23 13:17:39 +01:00
Lama Kayal	72a3c58d18	net/mlx4_en: Resolve bad operstate value Any link state change that's done prior to net device registration isn't reflected on the state, thus the operational state is left obsolete, with 'UNKNOWN' status. To resolve the issue, query link state from FW upon open operations to ensure operational state is updated. Fixes: `c27a02cd94` ("mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC") Signed-off-by: Lama Kayal <lkayal@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-09-19 13:21:04 +01:00
David Thompson	ee8a9600b5	mlxbf_gige: clear valid_polarity upon open The network interface managed by the mlxbf_gige driver can get into a problem state where traffic does not flow. In this state, the interface will be up and enabled, but will stop processing received packets. This problem state will happen if three specific conditions occur: 1) driver has received more than (N * RxRingSize) packets but less than (N+1 * RxRingSize) packets, where N is an odd number Note: the command "ethtool -g <interface>" will display the current receive ring size, which currently defaults to 128 2) the driver's interface was disabled via "ifconfig oob_net0 down" during the window described in #1. 3) the driver's interface is re-enabled via "ifconfig oob_net0 up" This patch ensures that the driver's "valid_polarity" field is cleared during the open() method so that it always matches the receive polarity used by hardware. Without this fix, the driver needs to be unloaded and reloaded to correct this problem state. Fixes: `f92e1869d7` ("Add Mellanox BlueField Gigabit Ethernet driver") Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com> Signed-off-by: David Thompson <davthompson@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-09-16 14:31:58 +01:00
Eli Cohen	7c3a0a018e	net/{mlx5\|nfp\|bnxt}: Remove unnecessary RTNL lock assert Remove the assert from the callback priv lookup function since it does not require RTNL lock and is already protected by flow_indr_block_lock. This will avoid warnings from being emitted to dmesg if the driver registers its callback after an ingress qdisc was created for a netdevice. The warnings started after the following patch was merged: commit `74fc4f8287` ("net: Fix offloading indirect devices dependency on qdisc order creation") Signed-off-by: Eli Cohen <elic@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-09-16 14:09:30 +01:00
Aya Levin	8db6a54f3c	net/mlx5e: Fix condition when retrieving PTP-rqn When activating the PTP-RQ, redirect the RQT from drop-RQ to PTP-RQ. Use mlx5e_channels_get_ptp_rqn to retrieve the rqn. This helper returns a boolean (not status), hence caller should consider return value 0 as a fail. Change the caller interpretation of the return value. Fixes: `43ec0f41fa` ("net/mlx5e: Hide all implementation details of mlx5e_rx_res") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-09-07 14:17:02 -07:00
Aya Levin	c91c1da72b	net/mlx5e: Fix mutual exclusion between CQE compression and HW TS Some profiles of the driver don't support a dedicated PTP-RQ, hence can't support HW TS and CQE compression simultaneously. When HW TS is enabled the COE compression is disabled, and should be restored when the HW TS is turned off. Add rx_filter as an input to modifying CQE compression to enforce this restriction. Fixes: `256f79d13c` ("net/mlx5e: Fix HW TS with CQE compression according to profile") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-09-07 14:16:58 -07:00
Maor Gottlieb	ee27e330a9	net/mlx5: Fix potential sleeping in atomic context Fixes the below flow of sleeping in atomic context by releasing the RCU lock before calling to free_match_list. build_match_list() <- disables preempt -> free_match_list() -> tree_put_node() -> down_write_ref_node() <- take write lock Fixes: `693c6883bb` ("net/mlx5: Add hash table for flow groups in flow table") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-09-07 14:16:55 -07:00
Saeed Mahameed	dfe6fd72b5	net/mlx5: FWTrace, cancel work on alloc pd error flow Handle error flow on mlx5_core_alloc_pd() failure, read_fw_strings_work must be canceled. Fixes: `c71ad41ccb` ("net/mlx5: FW tracer, events handling") Reported-by: Pavel Machek (CIP) <pavel@denx.de> Suggested-by: Pavel Machek (CIP) <pavel@denx.de> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com>	2021-09-07 14:16:52 -07:00
Mark Bloch	da8252d580	net/mlx5: Lag, don't update lag if lag isn't supported In NICs that don't support LAG, the LAG control structure won't be allocated. If it wasn't allocated it means LAG doesn't exists and can be skipped. Fixes: `cac1eb2cf2` ("net/mlx5: Lag, properly lock eswitch if needed") Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-09-07 14:16:49 -07:00
Parav Pandit	897ae4b40e	net/mlx5: Fix rdma aux device on devlink reload RDMA auxdev parameter registration was skipped for eswitch manager PCI PF. Due to this when devlink parameter is read, it reads as false in below code flow. $ devlink dev reload pci/0000:06:00.0 devlink_reload() mlx5_load_one() mlx5_attach_device() is_ib_enabled() devlink_param_driverinit_value_get() Due to this, is_ib_enabled() returns false for the RDMA auxiliary device. This results into a skipping RDMA auxiliary device creation on reload. There is no need to check for eswitch manager capability to support RDMA auxiliary device. Hence, fix it by skipping eswitch manager capability. Fixes: `87158cedf0` ("net/mlx5: Support enable_rdma devlink dev param") Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-09-07 14:16:46 -07:00
Vlad Buslov	8343268ec3	net/mlx5: Bridge, fix uninitialized variable usage In some conditions variable 'err' is not assigned with value in mlx5_esw_bridge_port_obj_attr_set() and mlx5_esw_bridge_port_changeupper() functions after recent changes to support LAG. Initialize the variable with zero value in both cases. Reported-by: Colin King <colin.king@canonical.com> Reported-by: Tim Gardner <tim.gardner@canonical.com> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> CC: linux-kernel@vger.kernel.org Fixes: `ff9b752146` ("net/mlx5: Bridge, support LAG") Signed-off-by: Vlad Buslov <vladbu@nvidia.com>	2021-09-07 14:16:42 -07:00
Jakub Kicinski	29ce8f9701	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net include/linux/netdevice.h net/socket.c `d0efb16294` ("net: don't unconditionally copy_from_user a struct ifreq for socket ioctls") `876f0bf9d0` ("net: socket: simplify dev_ifconf handling") `29c4964822` ("net: socket: rework compat_ifreq_ioctl()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-31 09:06:04 -07:00
Cai Huoqing	464a57281f	net/mlxbf_gige: Make use of devm_platform_ioremap_resourcexxx() Use the devm_platform_ioremap_resource_byname() helper instead of calling platform_get_resource_byname() and devm_ioremap_resource() separately Use the devm_platform_ioremap_resource() helper instead of calling platform_get_resource() and devm_ioremap_resource() separately Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-31 12:08:05 +01:00
Maxim Mikityanskiy	ca49bfd90a	sch_htb: Fix inconsistency when leaf qdisc creation fails In HTB offload mode, qdiscs of leaf classes are grafted to netdev queues. sch_htb expects the dev_queue field of these qdiscs to point to the corresponding queues. However, qdisc creation may fail, and in that case noop_qdisc is used instead. Its dev_queue doesn't point to the right queue, so sch_htb can lose track of used netdev queues, which will cause internal inconsistencies. This commit fixes this bug by keeping track of the netdev queue inside struct htb_class. All reads of cl->leaf.q->dev_queue are replaced by the new field, the two values are synced on writes, and WARNs are added to assert equality of the two values. The driver API has changed: when TC_HTB_LEAF_DEL needs to move a queue, the driver used to pass the old and new queue IDs to sch_htb. Now that there is a new field (offload_queue) in struct htb_class that needs to be updated on this operation, the driver will pass the old class ID to sch_htb instead (it already knows the new class ID). Fixes: `d03b195b5a` ("sch_htb: Hierarchical QoS hardware offload") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20210826115425.1744053-1-maximmi@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-30 16:33:59 -07:00
Yevgeny Kliteynik	a2ebfbb7b1	net/mlx5: DR, Add support for update FTE Add the support for update FTE, which is needed for cases where there are multiple rules with the same match. In such case fs_core will merge the actions and call update FTE to update current FTE. Since we don't want to disrupt the traffic, we will add the new duplicate rule, and only then remove the old duplicate rule. Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-26 15:38:04 -07:00
Yevgeny Kliteynik	8a015baef5	net/mlx5: DR, Improve rule tracking memory consumption To track each STE of the rule a rule member was allocated, each member would point to one STE. This means that we would allocate 40B (rule member) * number of STEs per rule. To reduce this per rule allocation we use the STE tree pointers for next_htbl and pointing STE to navigate the tree, this allows us to keep only the pointer to the last STE of rule (always unique). From the last rule STE we are able to traverse and rebuild all of the STEs that construct the rule. In our testing with 8M rules, each consisting of 7 STES, we were able to reduce 1.6GB of memory. Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Erez Shitrit <erezsh@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-26 15:38:04 -07:00
Yevgeny Kliteynik	32c8e3b230	net/mlx5: DR, Remove rehash ctrl struct from dr_htbl The calculations to decide for the maximum allowed collision threshold are simple and there is no reason to save them on the htbl struct. Signed-off-by: Erez Shitrit <erezsh@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-26 15:38:04 -07:00
Yevgeny Kliteynik	46f2a8ae8a	net/mlx5: DR, Remove HW specific STE type from nic domain Instead of using the HW specific STEv0 type, it is better to use an enum to indicate if this is an RX or TX nic domain. This means that now we will need to convert the nic domain type to the corresponding STE type. Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-26 15:38:03 -07:00
Yevgeny Kliteynik	ab9d1f9612	net/mlx5: DR, Merge DR_STE_SIZE enums Merge DR_STE_SIZE enums - no need for a separate enum for reduced STE size. Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-26 15:38:03 -07:00
Yevgeny Kliteynik	990467f8af	net/mlx5: DR, Skip source port matching on FDB RX domain The FDB RX pipe is connected to the wire and the source port for all incoming packets equals to wire, single uplink port per PF, this means there is no point of matching on the source port in such case. Once we recognize such case, we will optimize the RX steering rule. Note that in such case we clean both source_eswitch_owner_vhca_id and source_port. Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-26 15:38:03 -07:00
Yevgeny Kliteynik	63b85f49c0	net/mlx5: DR, Add ignore_flow_level support for multi-dest flow tables When creating an FTE, we might need to create multi-destination flow table, which is eventually created by FW. In such case, this FW table should include all the FTE properties as requested by the upper layer, including the ability to point to another flow table with level lower or equal to the current table - indicated by the "ignore_flow_level" property. Signed-off-by: Chris Mi <cmi@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-26 15:38:02 -07:00
Yevgeny Kliteynik	a01a43fa16	net/mlx5: DR, Use FW API when updating FW-owned flow table Need to call the DR API only when it is DR table. To update FW-owned table the driver should call the FW API. Signed-off-by: Erez Shitrit <erezsh@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-26 15:38:02 -07:00
Yevgeny Kliteynik	ae3eddcff7	net/mlx5: DR, replace uintN_t with kernel-style types Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-26 15:38:02 -07:00
Yevgeny Kliteynik	0733535d59	net/mlx5: DR, Support IPv6 matching on flow label for STEv0 Add missing support for matching on IPv6 flow label for STEv0. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-26 15:38:01 -07:00
Bodong Wang	d7d0b2450e	net/mlx5: DR, Reduce print level for FT chaining level check There are usecases with Connection Tracking that have such connection as default, printing this warning in dmesg confuses the user. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-26 15:38:01 -07:00
Yevgeny Kliteynik	d5a84e968f	net/mlx5: DR, Warn and ignore SW steering rule insertion on QP err In the event of SW steering QP entering error state, SW steering cannot insert more rules, and will silently ignore the insertion after issuing a warning. Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-26 15:38:01 -07:00
Yevgeny Kliteynik	f35715a657	net/mlx5: DR, Improve error flow in actions_build_ste_arr Improve error flow and print actions sequence when an invalid/unsupported sequence provided. Signed-off-by: Erez Shitrit <erezsh@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-26 15:38:01 -07:00
Yevgeny Kliteynik	ec449ed823	net/mlx5: DR, Enable QP retransmission Under high stress, SW steering might get stuck on polling for completion that never comes. For such cases QP needs to have protocol retransmission mechanism enabled. Currently the retransmission timeout is defined as 0 (unlimited). Fix this by defining a real timeout. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-26 15:38:00 -07:00
Yevgeny Kliteynik	2de40f68cf	net/mlx5: DR, Enable VLAN pop on TX and VLAN push on RX Enable pop VLAN action in TX and push VLAN in RX. These actions are supported only on STEv1. On TX: when a host sends a packet, VLAN is popped at the beginning. On RX: just before passing the packet to the host the VLAN is pushed. Signed-off-by: Muhammad Sammar <muhammads@nvidia.com> Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-26 15:38:00 -07:00
Yevgeny Kliteynik	f5e22be534	net/mlx5: DR, Split modify VLAN state to separate pop/push states Split modify vlan state in the actions state machine to pop vlan and push vlan states. This enables using of pop/push vlan without restrictions (e.g. pop vlan on TX in STEv1). Signed-off-by: Muhammad Sammar <muhammads@nvidia.com> Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-26 15:38:00 -07:00
Yevgeny Kliteynik	0139145fb8	net/mlx5: DR, Added support for REMOVE_HEADER packet reformat ConnectX supports offloading of various encapsulations and decapsulations (e.g. VXLAN), which are performed by 'Packet Reformat' action. Starting with ConnectX-6 DX, a new reformat type is supported - REMOVE_HEADER, which allows deleting an arbitrary size chunk at the selected position in the packet. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-26 15:37:59 -07:00
Wentao_Liang	6cc64770fb	net/mlx5: DR, fix a potential use-after-free bug In line 849 (#1), "mlx5dr_htbl_put(cur_htbl);" drops the reference to cur_htbl and may cause cur_htbl to be freed. However, cur_htbl is subsequently used in the next line, which may result in an use-after-free bug. Fix this by calling mlx5dr_err() before the cur_htbl is put. Signed-off-by: Wentao_Liang <Wentao_Liang_g@163.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-26 15:15:42 -07:00
Dmytro Linkin	f9d196bd63	net/mlx5e: Use correct eswitch for stack devices with lag If link aggregation is used within stack devices driver rejects encap rules if PF of the VF tunnel device is down. This happens because route resolved for other PF and its eswitch instance is used to determine correct vport. To fix that use devcom feature to retrieve other eswitch instance if failed to find vport for the 1st eswitch and LAG is active. Fixes: `10742efc20` ("net/mlx5e: VF tunnel TX traffic offloading") Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-26 15:15:42 -07:00

1 2 3 4 5 ...

7907 commits