linux

mirror of synced 2025-03-06 20:59:54 +01:00

Author	SHA1	Message	Date
Taehee Yoo	1fa89ffbc0	net: sfc: ef10: fix memory leak in efx_ef10_mtd_probe() In the NIC ->probe() callback, ->mtd_probe() callback is called. If NIC has 2 ports, ->probe() is called twice and ->mtd_probe() too. In the ->mtd_probe(), which is efx_ef10_mtd_probe() it allocates and initializes mtd partiion. But mtd partition for sfc is shared data. So that allocated mtd partition data from last called efx_ef10_mtd_probe() will not be used. Therefore it must be freed. But it doesn't free a not used mtd partition data in efx_ef10_mtd_probe(). kmemleak reports: unreferenced object 0xffff88811ddb0000 (size 63168): comm "systemd-udevd", pid 265, jiffies 4294681048 (age 348.586s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffffa3767749>] kmalloc_order_trace+0x19/0x120 [<ffffffffa3873f0e>] __kmalloc+0x20e/0x250 [<ffffffffc041389f>] efx_ef10_mtd_probe+0x11f/0x270 [sfc] [<ffffffffc0484c8a>] efx_pci_probe.cold.17+0x3df/0x53d [sfc] [<ffffffffa414192c>] local_pci_probe+0xdc/0x170 [<ffffffffa4145df5>] pci_device_probe+0x235/0x680 [<ffffffffa443dd52>] really_probe+0x1c2/0x8f0 [<ffffffffa443e72b>] __driver_probe_device+0x2ab/0x460 [<ffffffffa443e92a>] driver_probe_device+0x4a/0x120 [<ffffffffa443f2ae>] __driver_attach+0x16e/0x320 [<ffffffffa4437a90>] bus_for_each_dev+0x110/0x190 [<ffffffffa443b75e>] bus_add_driver+0x39e/0x560 [<ffffffffa4440b1e>] driver_register+0x18e/0x310 [<ffffffffc02e2055>] 0xffffffffc02e2055 [<ffffffffa3001af3>] do_one_initcall+0xc3/0x450 [<ffffffffa33ca574>] do_init_module+0x1b4/0x700 Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Fixes: `8127d661e7` ("sfc: Add support for Solarflare SFC9100 family") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://lore.kernel.org/r/20220512054709.12513-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-12 10:01:36 -07:00
Amit Cohen	810c2f0a3f	mlxsw: Avoid warning during ip6gre device removal IPv6 addresses which are used for tunnels are stored in a hash table with reference counting. When a new GRE tunnel is configured, the driver is notified and configures it in hardware. Currently, any change in the tunnel is not applied in the driver. It means that if the remote address is changed, the driver is not aware of this change and the first address will be used. This behavior results in a warning [1] in scenarios such as the following: # ip link add name gre1 type ip6gre local 2000::3 remote 2000::fffe tos inherit ttl inherit # ip link set name gre1 type ip6gre local 2000::3 remote 2000::ffff ttl inherit # ip link delete gre1 The change of the address is not applied in the driver. Currently, the driver uses the remote address which is stored in the 'parms' of the overlay device. When the tunnel is removed, the new IPv6 address is used, the driver tries to release it, but as it is not aware of the change, this address is not configured and it warns about releasing non existing IPv6 address. Fix it by using the IPv6 address which is cached in the IPIP entry, this address is the last one that the driver used, so even in cases such the above, the first address will be released, without any warning. [1]: WARNING: CPU: 1 PID: 2197 at drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2920 mlxsw_sp_ipv6_addr_put+0x146/0x220 [mlxsw_spectrum] ... CPU: 1 PID: 2197 Comm: ip Not tainted 5.17.0-rc8-custom-95062-gc1e5ded51a9a #84 Hardware name: Mellanox Technologies Ltd. MSN4700/VMOD0010, BIOS 5.11 07/12/2021 RIP: 0010:mlxsw_sp_ipv6_addr_put+0x146/0x220 [mlxsw_spectrum] ... Call Trace: <TASK> mlxsw_sp2_ipip_rem_addr_unset_gre6+0xf1/0x120 [mlxsw_spectrum] mlxsw_sp_netdevice_ipip_ol_event+0xdb/0x640 [mlxsw_spectrum] mlxsw_sp_netdevice_event+0xc4/0x850 [mlxsw_spectrum] raw_notifier_call_chain+0x3c/0x50 call_netdevice_notifiers_info+0x2f/0x80 unregister_netdevice_many+0x311/0x6d0 rtnl_dellink+0x136/0x360 rtnetlink_rcv_msg+0x12f/0x380 netlink_rcv_skb+0x49/0xf0 netlink_unicast+0x233/0x340 netlink_sendmsg+0x202/0x440 ____sys_sendmsg+0x1f3/0x220 ___sys_sendmsg+0x70/0xb0 __sys_sendmsg+0x54/0xa0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: `e846efe273` ("mlxsw: spectrum: Add hash table for IPv6 address mapping") Reported-by: Maksym Yaremchuk <maksymy@nvidia.com> Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20220511115747.238602-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-05-12 13:21:16 +02:00
Bin Chen	e0d0e1fdf1	nfp: VF rate limit support Add VF rate limit feature This patch enhances the NFP driver to supports assignment of both max_tx_rate and min_tx_rate to VFs The template of configurations below is all supported. e.g. # ip link set $DEV vf $VF_NUM max_tx_rate $RATE_VALUE # ip link set $DEV vf $VF_NUM min_tx_rate $RATE_VALUE # ip link set $DEV vf $VF_NUM max_tx_rate $RATE_VALUE \ min_tx_rate $RATE_VALUE # ip link set $DEV vf $VF_NUM min_tx_rate $RATE_VALUE \ max_tx_rate $RATE_VALUE The max RATE_VALUE is limited to 0xFFFF which is about 63Gbps (using 1024 for 1G) Signed-off-by: Bin Chen <bin.chen@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-05-12 13:03:08 +02:00
Colin Ian King	982c97eede	net: ethernet: SP7021: Fix spelling mistake "Interrput" -> "Interrupt" There is a spelling mistake in a dev_dbg message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20220511104448.150800-1-colin.i.king@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-05-12 12:15:15 +02:00
Vladimir Oltean	0f84d403b8	net: enetc: kill PHY-less mode for PFs Right now, a PHY-less port (no phy-mode, no fixed-link, no phy-handle) doesn't register with phylink, but calls netif_carrier_on() from enetc_start(). This makes sense for a VF, but for a PF, this is braindead, because we never call enetc_mac_enable() so the MAC is left inoperational. Furthermore, commit `71b77a7a27` ("enetc: Migrate to PHYLINK and PCS_LYNX") put the nail in the coffin because it removed the initial netif_carrier_off() call done right after register_netdev(). Without that call, netif_carrier_on() does not call linkwatch_fire_event(), so the operstate remains IF_OPER_UNKNOWN. Just deny the broken configuration by requiring that a phy-mode is present, and always register a PF with phylink. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://lore.kernel.org/r/20220511094200.558502-1-vladimir.oltean@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-05-12 11:05:43 +02:00
Kees Cook	43213daed6	fortify: Provide a memcpy trap door for sharp corners As we continue to narrow the scope of what the FORTIFY memcpy() will accept and build alternative APIs that give the compiler appropriate visibility into more complex memcpy scenarios, there is a need for "unfortified" memcpy use in rare cases where combinations of compiler behaviors, source code layout, etc, result in cases where the stricter memcpy checks need to be bypassed until appropriate solutions can be developed (i.e. fix compiler bugs, code refactoring, new API, etc). The intention is for this to be used only if there's no other reasonable solution, for its use to include a justification that can be used to assess future solutions, and for it to be temporary. Example usage included, based on analysis and discussion from: https://lore.kernel.org/netdev/CANn89iLS_2cshtuXPyNUGDPaic=sJiYfvTb_wNLgWrZRyBxZ_g@mail.gmail.com Cc: Jakub Kicinski <kuba@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Coco Li <lixiaoyan@google.com> Cc: Tariq Toukan <tariqt@nvidia.com> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: netdev@vger.kernel.org Cc: linux-hardening@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220511025301.3636666-1-keescook@chromium.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-05-12 10:49:23 +02:00
Florian Fainelli	6b77c06655	net: bcmgenet: Check for Wake-on-LAN interrupt probe deferral The interrupt controller supplying the Wake-on-LAN interrupt line maybe modular on some platforms (irq-bcm7038-l1.c) and might be probed at a later time than the GENET driver. We need to specifically check for -EPROBE_DEFER and propagate that error to ensure that we eventually fetch the interrupt descriptor. Fixes: `9deb48b53e` ("bcmgenet: add WOL IRQ check") Fixes: `5b1f0e6294` ("net: bcmgenet: Avoid touching non-existent interrupt") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Stefan Wahren <stefan.wahren@i2se.com> Link: https://lore.kernel.org/r/20220511031752.2245566-1-f.fainelli@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-05-12 09:58:58 +02:00
Yang Yingliang	00832b1d1a	net: ethernet: mediatek: ppe: fix wrong size passed to memset() 'foe_table' is a pointer, the real size of struct mtk_foe_entry should be pass to memset(). Fixes: `ba37b7caf1` ("net: ethernet: mtk_eth_soc: add support for initializing the PPE") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Acked-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20220511030829.3308094-1-yangyingliang@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-05-12 09:44:14 +02:00
Po Liu	285e8dedb4	net: enetc: count the tc-taprio window drops The enetc scheduler for IEEE 802.1Qbv has 2 options (depending on PTGCR[TG_DROP_DISABLE]) when we attempt to send an oversized packet which will never fit in its allotted time slot for its traffic class: either block the entire port due to head-of-line blocking, or drop the packet and set a bit in the writeback format of the transmit buffer descriptor, allowing other packets to be sent. We obviously choose the second option in the driver, but we do not detect the drop condition, so from the perspective of the network stack, the packet is sent and no error counter is incremented. This change checks the writeback of the TX BD when tc-taprio is enabled, and increments a specific ethtool statistics counter and a generic "tx_dropped" counter in ndo_get_stats64. Signed-off-by: Po Liu <Po.Liu@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-11 16:37:10 -07:00
Vladimir Oltean	32bf8e1f6f	net: enetc: manage ENETC_F_QBV in priv->active_offloads only when enabled Future work in this driver would like to look at priv->active_offloads & ENETC_F_QBV to determine whether a tc-taprio qdisc offload was installed, but this does not produce the intended effect. All the other flags in priv->active_offloads are managed dynamically, except ENETC_F_QBV which is set statically based on the probed SI capability. This change makes priv->active_offloads & ENETC_F_QBV really track the presence of a tc-taprio schedule on the port. Some existing users, like the enetc_sched_speed_set() call from phylink_mac_link_up(), are best kept using the old logic: the tc-taprio offload does not re-trigger another link mode resolve, so the scheduler needs to be functional from the get go, as long as Qbv is supported at all on the port. So to preserve functionality there, look at the static station interface capability from pf->si->hw_features instead. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-11 16:37:10 -07:00
Robert Hancock	138badbc21	net: macb: use NAPI for TX completion path This driver was using the TX IRQ handler to perform all TX completion tasks. Under heavy TX network load, this can cause significant irqs-off latencies (found to be in the hundreds of microseconds using ftrace). This can cause other issues, such as overrunning serial UART FIFOs when using high baud rates with limited UART FIFO sizes. Switch to using a NAPI poll handler to perform the TX completion work to get this out of hard IRQ context and avoid the IRQ latency impact. A separate NAPI instance is used for TX and RX to avoid checking the other ring's state unnecessarily when doing the poll, and so that the NAPI budget handling can work for both TX and RX packets. A new per-queue tx_ptr_lock spinlock has been added to avoid using the main device lock (with IRQs needing to be disabled) across the entire TX mapping operation, and also to protect the TX queue pointers from concurrent access between the TX start and TX poll operations. The TX Used Bit Read interrupt (TXUBR) handling also needs to be moved into the TX NAPI poll handler to maintain the proper order of operations. A flag is used to notify the poll handler that a UBR condition needs to be handled. The macb_tx_restart handler has had some locking added for global register access, since this could now potentially happen concurrently on different queues. Signed-off-by: Robert Hancock <robert.hancock@calian.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-11 16:14:13 -07:00
Robert Hancock	1900e30d0e	net: macb: simplify/cleanup NAPI reschedule checking Previously the macb_poll method was checking the RSR register after completing its RX receive work to see if additional packets had been received since IRQs were disabled, since this controller does not maintain the pending IRQ status across IRQ disable. It also had to double-check the register after re-enabling IRQs to detect if packets were received after the first check but before IRQs were enabled. Using the RSR register for this purpose is problematic since it reflects the global device state rather than the per-queue state, so if packets are being received on multiple queues it may end up retriggering receive on a queue where the packets did not actually arrive and not on the one where they did arrive. This will also cause problems with an upcoming change to use NAPI for the TX path where use of multiple queues is more likely. Add a macb_rx_pending function to check the RX ring to see if more packets have arrived in the queue, and use that to check if NAPI should be rescheduled rather than the RSR register. By doing this, we can just ignore the global RSR register entirely, and thus save some extra device register accesses at the same time. This also makes the previous first check for pending packets rather redundant, since it would be checking the RX ring state which was just checked in the receive work function. Therefore we can get rid of it and just check after enabling interrupts whether packets are already pending. Signed-off-by: Robert Hancock <robert.hancock@calian.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-11 16:14:13 -07:00
Xiaomeng Tong	3f95a7472d	i40e: i40e_main: fix a missing check on list iterator The bug is here: ret = i40e_add_macvlan_filter(hw, ch->seid, vdev->dev_addr, &aq_err); The list iterator 'ch' will point to a bogus position containing HEAD if the list is empty or no element is found. This case must be checked before any use of the iterator, otherwise it will lead to a invalid memory access. To fix this bug, use a new variable 'iter' as the list iterator, while use the origin variable 'ch' as a dedicated pointer to point to the found element. Cc: stable@vger.kernel.org Fixes: `1d8d80b4e4` ("i40e: Add macvlan support on i40e") Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20220510204846.2166999-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-11 15:19:28 -07:00
Jakub Kicinski	ddae9bc467	Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 1GbE Intel Wired LAN Driver Updates 2022-05-10 This series contains updates to igc driver only. Sasha cleans up the code by removing an unused function and removing an enum for PHY type as there is only one PHY. The return type for igc_check_downshift() is changed to void as it always returns success. * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: igc: Change type of the 'igc_check_downshift' method igc: Remove unused phy_type enum igc: Remove igc_set_spd_dplx method ==================== Link: https://lore.kernel.org/r/20220510210656.2168393-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-11 15:11:32 -07:00
Alex Williamson	920df8d6ef	Improve mlx5 live migration driver From Yishai: This series improves mlx5 live migration driver in few aspects as of below. Refactor to enable running migration commands in parallel over the PF command interface. To achieve that we exposed from mlx5_core an API to let the VF be notified before that the PF command interface goes down/up. (e.g. PF reload upon health recovery). Once having the above functionality in place mlx5 vfio doesn't need any more to obtain the global PF lock upon using the command interface but can rely on the above mechanism to be in sync with the PF. This can enable parallel VFs migration over the PF command interface from kernel driver point of view. In addition, Moved to use the PF async command mode for the SAVE state command. This enables returning earlier to user space upon issuing successfully the command and improve latency by let things run in parallel. Alex, as this series touches mlx5_core we may need to send this in a pull request format to VFIO to avoid conflicts before acceptance. Link: https://lore.kernel.org/all/20220510090206.90374-1-yishaih@nvidia.com Signed-of-by: Leon Romanovsky <leonro@nvidia.com> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQT1m3YD37UfMCUQBNwp8NhrnBAZsQUCYntY3AAKCRAp8NhrnBAZ scRWAP0QzEqg/Xqk/geUAQ3dliFrA2DZJm8v9B3x5tA5nEAazAD9HqC17MvDzY8T 6KBP7G37JNg2NCkxnKnt2gCIT+O4lgA= =zwWT -----END PGP SIGNATURE----- Merge tag 'mlx5-lm-parallel' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into v5.19/vfio/next Improve mlx5 live migration driver From Yishai: This series improves mlx5 live migration driver in few aspects as of below. Refactor to enable running migration commands in parallel over the PF command interface. To achieve that we exposed from mlx5_core an API to let the VF be notified before that the PF command interface goes down/up. (e.g. PF reload upon health recovery). Once having the above functionality in place mlx5 vfio doesn't need any more to obtain the global PF lock upon using the command interface but can rely on the above mechanism to be in sync with the PF. This can enable parallel VFs migration over the PF command interface from kernel driver point of view. In addition, Moved to use the PF async command mode for the SAVE state command. This enables returning earlier to user space upon issuing successfully the command and improve latency by let things run in parallel. Alex, as this series touches mlx5_core we may need to send this in a pull request format to VFIO to avoid conflicts before acceptance. Link: https://lore.kernel.org/all/20220510090206.90374-1-yishaih@nvidia.com Signed-of-by: Leon Romanovsky <leonro@nvidia.com>	2022-05-11 13:08:49 -06:00
Jakub Kicinski	01f4685797	eth: amd: remove NI6510 support (ni65) Looks like all the changes to this driver had been tree-wide refactoring since git era begun. The driver is using virt_to_bus() we should make it use more modern DMA APIs but since it's unlikely to be getting any use these days delete it instead. We can always revert to bring it back. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-11 13:09:59 +01:00
Grant Grundler	2120b7f4d1	net: atlantic: verify hw_head_ lies within TX buffer ring Bounds check hw_head index provided by NIC to verify it lies within the TX buffer ring. Reported-by: Aashay Shringarpure <aashay@google.com> Reported-by: Yi Chou <yich@google.com> Reported-by: Shervin Oloumi <enlightened@google.com> Signed-off-by: Grant Grundler <grundler@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-11 12:25:07 +01:00
Grant Grundler	6aecbba12b	net: atlantic: add check for MAX_SKB_FRAGS Enforce that the CPU can not get stuck in an infinite loop. Reported-by: Aashay Shringarpure <aashay@google.com> Reported-by: Yi Chou <yich@google.com> Reported-by: Shervin Oloumi <enlightened@google.com> Signed-off-by: Grant Grundler <grundler@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-11 12:25:07 +01:00
Grant Grundler	79784d77eb	net: atlantic: reduce scope of is_rsc_complete Don't defer handling the err case outside the loop. That's pointless. And since is_rsc_complete is only used inside this loop, declare it inside the loop to reduce it's scope. Signed-off-by: Grant Grundler <grundler@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-11 12:25:07 +01:00
Grant Grundler	62e0ae0f40	net: atlantic: fix "frag[0] not initialized" In aq_ring_rx_clean(), if buff->is_eop is not set AND buff->len < AQ_CFG_RX_HDR_SIZE, then hdr_len remains equal to buff->len and skb_add_rx_frag(xxx, 0, ...) is not called. The loop following this code starts calling skb_add_rx_frag() starting with i=1 and thus frag[0] is never initialized. Since i is initialized to zero at the top of the primary loop, we can just reference and post-increment i instead of hardcoding the 0 when calling skb_add_rx_frag() the first time. Reported-by: Aashay Shringarpure <aashay@google.com> Reported-by: Yi Chou <yich@google.com> Reported-by: Shervin Oloumi <enlightened@google.com> Signed-off-by: Grant Grundler <grundler@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-11 12:25:07 +01:00
David S. Miller	dc3a2001f6	mlx5-updates-2022-05-09 1) Gavin Li, adds exit route from waiting for FW init on device boot and increases FW init timeout on health recovery flow 2) Support 4 ports HCAs LAG mode Mark Bloch Says: ================ This series adds to mlx5 drivers support for 4 ports HCAs. Starting with ConnectX-7 HCAs with 4 ports are possible. As most driver parts aren't affected by such configuration most driver code is unchanged. Specially the only affected areas are: - Lag - Devcom - Merged E-Switch - Single FDB E-Switch Lag was chosen to be converted first. Creating hardware LAG when all 4 ports are added to the same bond device. Devom, merge E-Switch and single FDB E-Switch, are marked as supporting only 2 ports HCAs and future patches will add support for 4 ports HCAs. In order to activate the hardware lag a user can execute the: ip link add bond0 type bond ip link set bond0 type bond miimon 100 mode 2 ip link set eth2 master bond0 ip link set eth3 master bond0 ip link set eth4 master bond0 ip link set eth5 master bond0 Where eth2, eth3, eth4 and eth5 are the PFs of the same HCA. ================ -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmJ5/fwACgkQSD+KveBX +j5rBwgAtG+1peyKvVZWpAeXaHxSn+LFIrsRUhu3Gtw6u8hzcpa7ZKrz4vx8g3KN 0T92Cm0lXBvktFJu/mBEYveB5bOQPzyNOnxP4rnHmiFfNpRBZOxdC9YN8ycJVWYX IdK3UId4puzJ+CbVbqXv7SrVtUPCKsgKr/K/f6QzkKVaMl2NUvtA9VWkWYs/knM0 C5im6Y9LoUus4dNTW9rKYJIEkFEs4KocY0T9uobfs17lS4bUqcbp+8M/oLquUF3c UNQ4S5qpQk/qF4aZakVgDaZny/ZrAtlMtx53q3nFDvH5jDRQQpRh22aRR+rkPNOX 0mRX7/kGa5kDvmKSoNEhGfXbSFWECA== =UEL6 -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2022-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2022-05-09 1) Gavin Li, adds exit route from waiting for FW init on device boot and increases FW init timeout on health recovery flow 2) Support 4 ports HCAs LAG mode Mark Bloch Says: ================ This series adds to mlx5 drivers support for 4 ports HCAs. Starting with ConnectX-7 HCAs with 4 ports are possible. As most driver parts aren't affected by such configuration most driver code is unchanged. Specially the only affected areas are: - Lag - Devcom - Merged E-Switch - Single FDB E-Switch Lag was chosen to be converted first. Creating hardware LAG when all 4 ports are added to the same bond device. Devom, merge E-Switch and single FDB E-Switch, are marked as supporting only 2 ports HCAs and future patches will add support for 4 ports HCAs. In order to activate the hardware lag a user can execute the: ip link add bond0 type bond ip link set bond0 type bond miimon 100 mode 2 ip link set eth2 master bond0 ip link set eth3 master bond0 ip link set eth4 master bond0 ip link set eth5 master bond0 Where eth2, eth3, eth4 and eth5 are the PFs of the same HCA. ================ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-11 12:12:27 +01:00
Yang Yingliang	0807ce0b01	net: stmmac: fix missing pci_disable_device() on error in stmmac_pci_probe() Switch to using pcim_enable_device() to avoid missing pci_disable_device(). Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20220510031316.1780409-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-10 19:12:57 -07:00
Martin Habets	c5a13c319e	sfc: Add a basic Siena module Make the (un)load message more specific to differentiate it from the sfc.ko messages. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-10 15:38:15 -07:00
Martin Habets	782f713084	sfc/siena: Inline functions in sriov.h to avoid conflicts with sfc The implementation of each is quite short. This means sriov.c is not needed any more. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-10 15:38:15 -07:00
Martin Habets	c8443b6982	sfc/siena: Rename functions in nic_common.h to avoid conflicts with sfc For siena use efx_siena_ as the function prefix. efx_nic_update_stats_atomic is only used in efx_common.c, so move it there. efx_nic_copy_stats is not used in Siena, so it is removed. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-10 15:38:15 -07:00
Martin Habets	4d49e5cd4b	sfc/siena: Rename functions in mcdi headers to avoid conflicts with sfc For siena use efx_siena_ as the function prefix. Several functions are not used in Siena, so they are removed. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-10 15:38:15 -07:00
Martin Habets	95e96f7788	sfc/siena: Rename peripheral functions to avoid conflicts with sfc For siena use efx_siena_ as the function prefix. This patch covers selftest.h, ptp.h, net_driver.h and ethtool_common.h. efx_ethtool_fill_self_tests() can become static. Some functions in ptp.c can also become static. Rename loopback_mode in net_driver.h. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-10 15:38:15 -07:00
Martin Habets	7f9e4b2a61	sfc/siena: Rename RX/TX functions to avoid conflicts with sfc For siena use efx_siena_ as the function prefix. Several functions are not used in Siena, so they are removed. Use a Siena specific variable name for module parameter efx_separate_tx_channels. Move efx_fini_tx_queue() to avoid a forward declaration of efx_dequeue_buffer(). Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-10 15:38:14 -07:00
Martin Habets	71ad88f661	sfc/siena: Rename functions in efx headers to avoid conflicts with sfc When building with allyesconfig there are many identical symbol names. For siena use efx_siena_ as the function and variable prefix to avoid build errors. efx_mtd_remove_partition can become static as it is no longer called from other files. efx_ticks_to_usecs and efx_xmit_done_single are not used in Siena, so they are removed. Several functions are only used inside efx_channels.c for Siena so they can become static. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-10 15:38:14 -07:00
Martin Habets	956f2d86cb	sfc/siena: Remove build references to missing functionality Functionality not supported or needed on Siena includes: - Anything for EF100 - EF10 specifics such as register access, PIO and TSO offload. Also only bind to Siena NICs. Remove EF10 specifics from nic.h. The functions that start with efx_farch_ will be removed from sfc.ko with a subsequent patch. Add the efx_ prefix to siena_prepare_flush() to make it consistent with the other APIs. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-10 15:38:14 -07:00
Martin Habets	d48523cb88	sfc: Copy shared files needed for Siena (part 2) These are the files starting with m through w. No changes are done, those will be done with subsequent commits. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-10 15:38:14 -07:00
Martin Habets	6e173d3b4a	sfc: Copy shared files needed for Siena (part 1) These are the files starting with b through i. No changes are done, those will be done with subsequent commits. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-10 15:38:14 -07:00
Martin Habets	36ff639329	sfc: Move Siena specific files Files are only moved, no changes are made. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-10 15:38:13 -07:00
Louis Peens	61004d1d4b	nfp: flower: fix 'variable 'flow6' set but not used' Kernel test robot reported an issue after a recent patch about an unused variable when CONFIG_IPV6 is disabled. Move the variable declaration to be inside the #ifdef, and do a bit more cleanup. There is no need to use a temporary ipv6 bool value, it is just checked once, remove the extra variable and just do the check directly. Fixes: `9d5447ed44` ("nfp: flower: fixup ipv6/ipv4 route lookup for neigh events") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20220510074845.41457-1-simon.horman@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-10 15:15:12 -07:00
Sasha Neftin	95073d0815	igc: Change type of the 'igc_check_downshift' method The 'igc_check_downshift' method always returns 0; there is no need for a return value so change the type of this method to void. Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-05-10 14:02:53 -07:00
Sasha Neftin	7241069f7a	igc: Remove unused phy_type enum Complete to commit `8e153faf58` ("igc: Remove unused phy type") i225 parts have only one PHY. There is no point to use phy_type enum. Clean up the code accordingly, and get rid of the unused enum lines. Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-05-10 14:02:40 -07:00
Sasha Neftin	d098538ed4	igc: Remove igc_set_spd_dplx method igc_set_spd_dplx method is not used. This patch comes to tidy up the driver code. Reported-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-05-10 14:02:18 -07:00
Yishai Hadas	846e437387	net/mlx5: Expose mlx5_sriov_blocking_notifier_register / unregister APIs Expose mlx5_sriov_blocking_notifier_register / unregister APIs to let a VF register to be notified for its enablement / disablement by the PF. Upon VF probe it will call mlx5_sriov_blocking_notifier_register() with its notifier block and upon VF remove it will call mlx5_sriov_blocking_notifier_unregister() to drop its registration. This can give a VF the ability to clean some resources upon disable before that the command interface goes down and on the other hand sets some stuff before that it's enabled. This may be used by a VF which is migration capable in few cases.(e.g. PF load/unload upon an health recovery). Link: https://lore.kernel.org/r/20220510090206.90374-2-yishaih@nvidia.com Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2022-05-10 15:45:28 +03:00
Wells Lu	fd3040b939	net: ethernet: Add driver for Sunplus SP7021 Add driver for Sunplus SP7021 SoC. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Wells Lu <wellslutw@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-05-10 11:31:32 +02:00
Manuel Ullmann	1809c30b6e	net: atlantic: always deep reset on pm op, fixing up my null deref regression The impact of this regression is the same for resume that I saw on thaw: the kernel hangs and nothing except SysRq rebooting can be done. Fixes regression in commit `cbe6c3a8f8` ("net: atlantic: invert deep par in pm functions, preventing null derefs"), where I disabled deep pm resets in suspend and resume, trying to make sense of the atl_resume_common() deep parameter in the first place. It turns out, that atlantic always has to deep reset on pm operations. Even though I expected that and tested resume, I screwed up by kexec-rebooting into an unpatched kernel, thus missing the breakage. This fixup obsoletes the deep parameter of atl_resume_common, but I leave the cleanup for the maintainers to post to mainline. Suspend and hibernation were successfully tested by the reporters. Fixes: `cbe6c3a8f8` ("net: atlantic: invert deep par in pm functions, preventing null derefs") Link: https://lore.kernel.org/regressions/9-Ehc_xXSwdXcvZqKD5aSqsqeNj5Izco4MYEwnx5cySXVEc9-x_WC4C3kAoCqNTi-H38frroUK17iobNVnkLtW36V6VWGSQEOHXhmVMm5iQ=@protonmail.com/ Reported-by: Jordan Leppert <jordanleppert@protonmail.com> Reported-by: Holger Hoffstaette <holger@applied-asynchrony.com> Tested-by: Jordan Leppert <jordanleppert@protonmail.com> Tested-by: Holger Hoffstaette <holger@applied-asynchrony.com> CC: <stable@vger.kernel.org> # 5.10+ Signed-off-by: Manuel Ullmann <labre@posteo.de> Link: https://lore.kernel.org/r/87bkw8dfmp.fsf@posteo.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-05-10 10:29:54 +02:00
Gerhard Engleder	0abb62b682	tsnep: Add free running cycle counter support The TSN endpoint Ethernet MAC supports a free running counter additionally to its clock. This free running counter can be read and hardware timestamps are supported. As the name implies, this counter cannot be set and its frequency cannot be adjusted. Add free running cycle counter support based on this free running counter to physical clock. This also requires hardware time stamps based on that free running counter. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-05-10 09:48:09 +02:00
Jakub Kicinski	b3552d6a3b	eth: dpaa2-mac: remove a dead-code NULL check on fwnode parent Since commit `4e30e98c4b` ("dpaa2-mac: return -EPROBE_DEFER from dpaa2_mac_open in case the fwnode is not set") @parent can't be NULL after the if. It's either the address of the ->fwnode of @dpmacs or @fwnode in case of ACPI. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20220506200029.852310-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-05-10 09:19:56 +02:00
Mark Bloch	7f46a0b732	net/mlx5: Lag, add debugfs to query hardware lag state Lag state has become very complicated with many modes, flags, types and port selections methods and future work will add additional features. Add a debugfs to query the current lag state. A new directory named "lag" will be created under the mlx5 debugfs directory. As the driver has debugfs per pci function the location will be: <debugfs>/mlx5/<BDF>/lag For example: /sys/kernel/debug/mlx5/0000:08:00.0/lag The following files are exposed: - state: Returns "active" or "disabled". If "active" it means hardware lag is active. - members: Returns the BDFs of all the members of lag object. - type: Returns the type of the lag currently configured. Valid only if hardware lag is active. * "roce" - Members are bare metal PFs. * "switchdev" - Members are in switchdev mode. * "multipath" - ECMP offloads. - port_sel_mode: Returns the egress port selection method, valid only if hardware lag is active. * "queue_affinity" - Egress port is selected by the QP/SQ affinity. * "hash" - Egress port is selected by hash done on each packet. Controlled by: xmit_hash_policy of the bond device. - flags: Returns flags that are specific per lag @type. Valid only if hardware lag is active. * "shared_fdb" - "on" or "off", if "on" single FDB is used. - mapping: Returns the mapping which is used to select egress port. Valid only if hardware lag is active. If @port_sel_mode is "hash" returns the active egress ports. The hash result will select only active ports. if @port_sel_mode is "queue_affinity" returns the mapping between the configured port affinity of the QP/SQ and actual egress port. For example: * 1:1 - Mapping means if the configured affinity is port 1 traffic will egress via port 1. * 1:2 - Mapping means if the configured affinity is port 1 traffic will egress via port 2. This can happen if port 1 is down or in active/backup mode and port 1 is backup. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-05-09 22:54:04 -07:00
Mark Bloch	352899f384	net/mlx5: Lag, use buckets in hash mode When in hardware lag and the NIC has more than 2 ports when one port goes down need to distribute the traffic between the remaining active ports. For better spread in such cases instead of using 1-to-1 mapping and only 4 slots in the hash, use many. Each port will have many slots that point to it. When a port goes down go over all the slots that pointed to that port and spread them between the remaining active ports. Once the port comes back restore the default mapping. We will have number_of_ports * MLX5_LAG_MAX_HASH_BUCKETS slots. Each MLX5_LAG_MAX_HASH_BUCKETS belong to a different port. The native mapping is such that: port 1: The first MLX5_LAG_MAX_HASH_BUCKETS slots are: [1, 1, .., 1] which means if a packet is hased into one of this slots it will hit the wire via port 1. port 2: The second MLX5_LAG_MAX_HASH_BUCKETS slots are: [2, 2, .., 2] which means if a packet is hased into one of this slots it will hit the wire via port2. and this mapping is the same of the rest of the ports. On a failover, lets say port 2 goes down (port 1, 3, 4 are still up). the new mapping for port 2 will be: port 2: The second MLX5_LAG_MAX_HASH_BUCKETS are: [1, 3, 1, 4, .., 4] which means the mapping was changed from the native mapping to a mapping that consists of only the active ports. With this if a port goes down the traffic will be split between the active ports randomly Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-05-09 22:54:03 -07:00
Mark Bloch	24b3599eff	net/mlx5: Lag, refactor dmesg print Combine dmesg lag prints into a single function. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-05-09 22:54:03 -07:00
Mark Bloch	4cd14d44b1	net/mlx5: Support devices with more than 2 ports Increase the define MLX5_MAX_PORTS to 4 as the driver is ready to support NICs with 4 ports. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-05-09 22:54:03 -07:00
Mark Bloch	7e978e7714	net/mlx5: Lag, use actual number of lag ports Refactor the entire lag code to use ldev->ports instead of hard-coded defines (like MLX5_MAX_PORTS) for its operations. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-05-09 22:54:02 -07:00
Mark Bloch	cdf611d170	net/mlx5: Lag, use hash when in roce lag on 4 ports Downstream patches will add support for lag over 4 ports. In that mode we will only use hash as the uplink selection method. Using hash instead of queue affinity (before this patch) offers key advantages like: - Align ports selection method with the method used by the bond device - Better packets distribution where a single queue can transmit from multiple ports (with queue affinity a queue is bound to a single port regardless of the packet being sent). - In case of failover we traffic is split between multiple ports and not a single one like in queue affinity. Going forward it was decided that queue affinity will be deprecated as using hash provides a better user experience which means on 4 ports HCAs hash will always be used. Future work will add hash support for 2 ports HCAs as well. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-05-09 22:54:02 -07:00
Mark Bloch	e2c45931ff	net/mlx5: Lag, support single FDB only on 2 ports E-Switch currently doesn't support more than 2 E-Switch managers being aggregated under a single hardware lag. Have specific checks to disallow creating lag when the code doesn't support it. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-05-09 22:54:02 -07:00
Mark Bloch	e9d5bb51c5	net/mlx5: Lag, store number of ports inside lag object Store the number of lag ports inside the lag object. Lag object is a single shared object managing the lag state of multiple mlx5 devices on the same physical HCA. Downstream patches will allow hardware lag to be created over devices with more than 2 ports. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-05-09 22:54:02 -07:00

... 4 5 6 7 8 ...

42672 commits