linux

mirror of synced 2025-03-06 20:59:54 +01:00

Author	SHA1	Message	Date
Jesse Brandeburg	0092db5fac	ice: trivial: fix odd indenting Fix an odd indent where some code was left indented, and causes smatch to warn: ice_log_pkg_init() warn: inconsistent indenting While here, for consistency, add a break after the default case. This commit has a Fixes: but we caught this while it was only in net-next. Fixes: `247dd97d71` ("ice: Refactor status flow for DDP load") Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Link: https://lore.kernel.org/r/20211221230538.2546315-1-jesse.brandeburg@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-22 14:57:55 -08:00
Jakub Kicinski	2030eddced	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2021-12-21 This series contains updates to ice driver only. Karol modifies the reset flow to correct issues with PTP reset. Jake extends PTP support for E822 based devices. This includes a few cleanup patches, that fix some minor issues. In addition, there are some slight refactors to ease the addition of E822 support, followed by adding the new hardware implementation ice_ptp_hw.c. There are a few major differences with E822 support compared to E810 support: ) The E822 device has a Clock Generation Unit which must be initialized in order to generate proper clock frequencies on the output that drives the PTP hardware clock registers ) The E822 PHY is a bit different and requires a more complex initialization procedure which must be rerun any time the link configuration changes. ) The E822 devices support enhanced timestamp calibration by making use of a process called Vernier offset measurement. This allows the hardware to measure phase offset related to the PHY clocks for Serdes and FEC, reducing the inaccuracy of the timestamp relative to the actual packet transmission and receipt. Making use of this requires data gathered from the first transmitted and received packets, and waiting for the PHY to complete the calibration measurements. This is done as part of a new kthread, ov_work. Note that to avoid delay in enabling timestamps, we start the PHY in 'bypass' mode which allows timestamps to be captured without the Vernier calibration measurement. Once the first packets have been sent and received, we then complete the calibration setup and exit bypass mode and begin using the more precise timestamps. According to the datasheet, timestamps without calibration data can be incorrect relative to actual receipt or transmission by up to 1 clock cycle (~1.25 nanoseconds), while calibrated timestamps should be correct to within 1/8th of a clock cycle (~0.15 nanoseconds). ) E822 devices support crosstimestamping via PCIe PTM, which we enable when available on the platform. There is a fair amount of logic required to perform PHY and CGU initialization, which is the vast majority of the new code, but it is fairly self contained within ice_ptp_hw.c, with the exception of monitoring for offset validity being handled by a kthread. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: support crosstimestamping on E822 devices if supported ice: exit bypass mode once hardware finishes timestamp calibration ice: ensure the hardware Clock Generation Unit is configured ice: implement basic E822 PTP support ice: convert clk_freq capability into time_ref ice: introduce ice_ptp_init_phc function ice: use 'int err' instead of 'int status' in ice_ptp_hw.c ice: PTP: move setting of tstamp_config ice: introduce ice_base_incval function ice: Fix E810 PTP reset flow ==================== Link: https://lore.kernel.org/r/20211221174845.3063640-1-anthony.l.nguyen@intel.com Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-22 14:25:42 -08:00
Xiang wangx	37cf276df1	fm10k: Fix syntax errors in comments Delete the redundant word 'by'. Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-21 09:17:47 -08:00
Karen Sornek	630f6edc48	igbvf: Refactor trace Refactoring "PF still resetting" message, because previous version looked like a bug - it informed about changes that worked as designed but might confuse users. Changes requested to make message more user-friendly. Signed-off-by: Karen Sornek <karen.sornek@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-21 09:17:47 -08:00
Jason Wang	890781af31	igb: remove never changed variable `ret_val' The variable used for return status in `igb_write_xmdio_reg' function is never changed and this function is just need return 0. Thus, the `ret_val' can be removed and return 0 at the end of the `igb_write_xmdio_reg' function. Signed-off-by: Jason Wang <wangborong@cdjrlc.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-21 09:17:47 -08:00
Sasha Neftin	b8773a66f6	igc: Remove obsolete define 'MII_CR_FULL_DUPLEX' define not in use. This patch comes to tidy up obsolete define. Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-21 09:17:47 -08:00
Sasha Neftin	d2a66dd3fd	igc: Remove obsolete mask 'IGC_CTRL_EXT_LINK_MODE_MASK' not in use. This patch comes to tidy up obsolete define. Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-21 09:17:47 -08:00
Sasha Neftin	2a8807a765	igc: Remove obsolete nvm type i225 devices use only spi nvm type. This patch comes to tidy up obsolete nvm types. Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-21 09:17:47 -08:00
Sasha Neftin	8e153faf58	igc: Remove unused phy type _phy_none type not in use. Clean up the code accordingly, and get rid of the unused enum line Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-21 09:17:47 -08:00
Sasha Neftin	7a34cda1ee	igc: Remove unused _I_PHY_ID define _I_PHY_ID not in use. Clean up the code accordingly, and get rid of the unused define Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-21 09:17:47 -08:00
Jacob Keller	13a64f0b98	ice: support crosstimestamping on E822 devices if supported E822 devices on supported platforms can generate a cross timestamp between the platform ART and the device time. This process allows for very precise measurement of the difference between the PTP hardware clock and the platform time. This is only supported if we know the TSC frequency relative to ART, so we do not enable this unless the boot CPU has a known TSC frequency (as required by convert_art_ns_to_tsc). Because PCIe PTM support is not available on all platforms, introduce CONFIG_ICE_HWTS and make it depend on X86 where we know the support exists. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-21 09:11:40 -08:00
Jacob Keller	a69f1cb62a	ice: exit bypass mode once hardware finishes timestamp calibration Once the E822 device has sent and received one packet, the hardware computes the internal delay of the PHY using a process known as Vernier calibration. This calibration calculates a more accurate offset for the Tx and Rx timestamps. To make use of this offset, we need to exit the bypass mode. This cannot be done until the PHY has completed offset calibration, as indicated by the offset valid bits. To handle this, introduce a kthread work item which will poll the offset valid bits every few milliseconds seeing if it is safe to exit bypass mode. Once we have finished calibrating the offsets, we can program the total Tx and Rx offset registers and turn off the bypass bit. This allows the hardware to include the more precise vernier calibration offset, and improves the timestamp precision. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-21 09:11:40 -08:00
Jacob Keller	b111ab5a11	ice: ensure the hardware Clock Generation Unit is configured The E822 device has a Clock Generation Unit (CGU) responsible for determining the clock frequency that drives the timers. Ensure this function is initialized when bringing up the PTP support, so that the clock has a known frequency. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-21 09:11:40 -08:00
Jacob Keller	3a7496234d	ice: implement basic E822 PTP support Implement support for the basic operations needed to enable the PTP hardware clock on E822 devices. This includes implementations for the various PHY access functions, as well as the ability to start and stop the PHY timers. This is different from the E810 device because the configuration depends on link speed, so we cannot just start the PHYs immediately. We must wait until the link is up to get proper values for the speed based initialization. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-21 09:11:37 -08:00
Jacob Keller	405efa49b5	ice: convert clk_freq capability into time_ref Convert the clk_freq value into the associated time_ref frequency value for E822 devices. This simplifies determining the time reference value for the clock. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-21 09:11:02 -08:00
Jacob Keller	b2ee72565c	ice: introduce ice_ptp_init_phc function When we enable support for E822 devices, there are some additional steps required to initialize the PTP hardware clock. To make this easier to implement as device-specific behavior, refactor the register setups in ice_ptp_init_owner to a new ice_ptp_init_phc function defined in ice_ptp_hw.c This function will have a common section, and an e810 specific sub-implementation. This will enable easily extending the functionality to cover the E822 specific setup required to initialize the hardware clock generation unit. It also makes it clear which steps are E810 specific vs which ones are necessary for all ice devices. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-21 09:11:02 -08:00
Jacob Keller	39b2810642	ice: use 'int err' instead of 'int status' in ice_ptp_hw.c The ice_ptp_hw.c file introduced a bunch of uses of "int status" instead of the more traditional "int err" or "int ret". These are actually traditional Linux error codes (as opposed to the recently removed ice_status enumeration values). We're about to add a bunch of new functions to ice_ptp_hw.c. It's normally preferred in the ice driver to use "int ret" or "int err" when dealing with error code values. Instead of making the new functions use "int status" lets just fix all of ice_ptp_hw.c to use "int err". This will match the new functions and ensures a consistent style across at least the PTP related files. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-21 09:11:02 -08:00
Jacob Keller	e59d75dd41	ice: PTP: move setting of tstamp_config The tstamp_config structure is being set inside of ice_ptp_cfg_timestamp, which is the function used to set Tx and Rx timestamping during initialization. This function is also used in order to set the PHY port timestamping status. However, it makes sense to always set the tstamp_config directly whenever the ice_set_tx_tstamp or ice_set_rx_tstamp functions are called. Move assignment of tstamp_config into the related functions and out of ice_ptp_cfg_timestamp. Now that we assign the timestamp mode in the relevant functions, we no longer modify the config value in ice_set_timestamp_mode. In turn, we no longer want to copy that config value into the PF cached structure. Instead, this is now the source of truth for actual configuration. On success of ice_set_timestamp_mode, copy the real configured mode back to report it out to userspace. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-21 09:11:02 -08:00
Jacob Keller	78267d0c9c	ice: introduce ice_base_incval function A future change will add additional possible increment values for the E822 device support. To handle this, we want to look up the increment value to use instead of hard coding it to the nominal value for E810 devices. Introduce ice_base_incval as a function to get the best nominal increment value to use. For now, it just returns the E810 value, but will be refactored in the future to look up the value based on the device type and configured clock frequency. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-21 09:11:02 -08:00
Karol Kolacinski	4809671015	ice: Fix E810 PTP reset flow The PF reset does not reset PHC and PHY clocks so it's unnecessary to stop them and reinitialize after the reset. Configuring timestamping changes the VSI fields so it needs to be performed after VSIs are initialized, which was not done in case of a reset. Suggested-by: Patrick Talbert <ptalbert@redhat.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Tested-by: Pasi Vaananen <pvaanane@redhat.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-21 09:09:57 -08:00
Heiner Kallweit	ac8c58f5b5	igb: fix deadlock caused by taking RTNL in RPM resume path Recent net core changes caused an issue with few Intel drivers (reportedly igb), where taking RTNL in RPM resume path results in a deadlock. See [0] for a bug report. I don't think the core changes are wrong, but taking RTNL in RPM resume path isn't needed. The Intel drivers are the only ones doing this. See [1] for a discussion on the issue. Following patch changes the RPM resume path to not take RTNL. [0] https://bugzilla.kernel.org/show_bug.cgi?id=215129 [1] https://lore.kernel.org/netdev/20211125074949.5f897431@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/t/ Fixes: `bd869245a3` ("net: core: try to runtime-resume detached device in __dev_open") Fixes: `f32a213765` ("ethtool: runtime-resume netdev parent before ethtool ioctl ops") Tested-by: Martin Stolpe <martin.stolpe@gmail.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20211220201844.2714498-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-20 18:48:09 -08:00
Brett Creeley	92fc508598	iavf: Restrict maximum VLAN filters for VIRTCHNL_VF_OFFLOAD_VLAN_V2 For VIRTCHNL_VF_OFFLOAD_VLAN, PF's would limit the number of VLAN filters a VF was allowed to add. However, by the time the opcode failed, the VLAN netdev had already been added. VIRTCHNL_VF_OFFLOAD_VLAN_V2 added the ability for a PF to tell the VF how many VLAN filters it's allowed to add. Make changes to support that functionality. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-17 12:37:19 -08:00
Brett Creeley	8afadd1cd8	iavf: Add support for VIRTCHNL_VF_OFFLOAD_VLAN_V2 offload enable/disable The new VIRTCHNL_VF_OFFLOAD_VLAN_V2 capability added support that allows the VF to support 802.1Q and 802.1ad VLAN insertion and stripping if successfully negotiated via VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS. Multiple changes were needed to support this new functionality. 1. Added new aq_required flags to support any kind of VLAN stripping and insertion offload requests via virtchnl. 2. Added the new method iavf_set_vlan_offload_features() that's used during VF initialization, VF reset, and iavf_set_features() to set the aq_required bits based on the current VLAN offload configuration of the VF's netdev. 3. Added virtchnl handling for VIRTCHNL_OP_ENABLE_STRIPPING_V2, VIRTCHNL_OP_DISABLE_STRIPPING_V2, VIRTCHNL_OP_ENABLE_INSERTION_V2, and VIRTCHNL_OP_ENABLE_INSERTION_V2. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-17 12:37:19 -08:00
Brett Creeley	ccd219d2ea	iavf: Add support for VIRTCHNL_VF_OFFLOAD_VLAN_V2 hotpath The new VIRTCHNL_VF_OFFLOAD_VLAN_V2 capability added support that allows the PF to set the location of the Tx and Rx VLAN tag for insertion and stripping offloads. In order to support this functionality a few changes are needed. 1. Add a new method to cache the VLAN tag location based on negotiated capabilities for the Tx and Rx ring flags. This needs to be called in the initialization and reset paths. 2. Refactor the transmit hotpath to account for the new Tx ring flags. When IAVF_TXR_FLAGS_VLAN_LOC_L2TAG2 is set, then the driver needs to insert the VLAN tag in the L2TAG2 field of the transmit descriptor. When the IAVF_TXRX_FLAGS_VLAN_LOC_L2TAG1 is set, then the driver needs to use the l2tag1 field of the data descriptor (same behavior as before). 3. Refactor the iavf_tx_prepare_vlan_flags() function to simplify transmit hardware VLAN offload functionality by only depending on the skb_vlan_tag_present() function. This can be done because the OS won't request transmit offload for a VLAN unless the driver told the OS it's supported and enabled. 4. Refactor the receive hotpath to account for the new Rx ring flags and VLAN ethertypes. This requires checking the Rx ring flags and descriptor status bits to determine the location of the VLAN tag. Also, since only a single ethertype can be supported at a time, check the enabled netdev features before specifying a VLAN ethertype in __vlan_hwaccel_put_tag(). Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-17 12:37:19 -08:00
Brett Creeley	48ccc43ecf	iavf: Add support VIRTCHNL_VF_OFFLOAD_VLAN_V2 during netdev config Based on VIRTCHNL_VF_OFFLOAD_VLAN_V2, the VF can now support more VLAN capabilities (i.e. 802.1AD offloads and filtering). In order to communicate these capabilities to the netdev layer, the VF needs to parse its VLAN capabilities based on whether it was able to negotiation VIRTCHNL_VF_OFFLOAD_VLAN or VIRTCHNL_VF_OFFLOAD_VLAN_V2 or neither of these. In order to support this, add the following functionality: iavf_get_netdev_vlan_hw_features() - This is used to determine the VLAN features that the underlying hardware supports and that can be toggled off/on based on the negotiated capabiltiies. For example, if VIRTCHNL_VF_OFFLOAD_VLAN_V2 was negotiated, then any capability marked with VIRTCHNL_VLAN_TOGGLE can be toggled on/off by the VF. If VIRTCHNL_VF_OFFLOAD_VLAN was negotiated, then only VLAN insertion and/or stripping can be toggled on/off. iavf_get_netdev_vlan_features() - This is used to determine the VLAN features that the underlying hardware supports and that should be enabled by default. For example, if VIRTHCNL_VF_OFFLOAD_VLAN_V2 was negotiated, then any supported capability that has its ethertype_init filed set should be enabled by default. If VIRTCHNL_VF_OFFLOAD_VLAN was negotiated, then filtering, stripping, and insertion should be enabled by default. Also, refactor iavf_fix_features() to take into account the new capabilities. To do this, query all the supported features (enabled by default and toggleable) and make sure the requested change is supported. If VIRTCHNL_VF_OFFLOAD_VLAN_V2 is successfully negotiated, there is no need to check VIRTCHNL_VLAN_TOGGLE here because the driver already told the netdev layer which features can be toggled via netdev->hw_features during iavf_process_config(), so only those features will be requested to change. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-17 12:37:19 -08:00
Brett Creeley	209f2f9c71	iavf: Add support for VIRTCHNL_VF_OFFLOAD_VLAN_V2 negotiation In order to support the new VIRTCHNL_VF_OFFLOAD_VLAN_V2 capability the VF driver needs to rework it's initialization state machine and reset flow. This has to be done because successful negotiation of VIRTCHNL_VF_OFFLOAD_VLAN_V2 requires the VF driver to perform a second capability request via VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS before configuring the adapter and its netdev. Add the VIRTCHNL_VF_OFFLOAD_VLAN_V2 bit when sending the VIRTHCNL_OP_GET_VF_RESOURECES message. The underlying PF will either support VIRTCHNL_VF_OFFLOAD_VLAN or VIRTCHNL_VF_OFFLOAD_VLAN_V2 or neither. Both of these offloads should never be supported together. Based on this, add 2 new states to the initialization state machine: __IAVF_INIT_GET_OFFLOAD_VLAN_V2_CAPS __IAVF_INIT_CONFIG_ADAPTER The __IAVF_INIT_GET_OFFLOAD_VLAN_V2_CAPS state is used to request/store the new VLAN capabilities if and only if VIRTCHNL_VLAN_OFFLOAD_VLAN_V2 was successfully negotiated in the __IAVF_INIT_GET_RESOURCES state. The __IAVF_INIT_CONFIG_ADAPTER state is used to configure the adapter/netdev after the resource requests have finished. The VF will move into this state regardless of whether it successfully negotiated VIRTCHNL_VF_OFFLOAD_VLAN or VIRTCHNL_VF_OFFLOAD_VLAN_V2. Also, add a the new flag IAVF_FLAG_AQ_GET_OFFLOAD_VLAN_V2_CAPS and set it during VF reset. If VIRTCHNL_VF_OFFLOAD_VLAN_V2 was successfully negotiated then the VF will request its VLAN capabilities via VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS during the reset. This is needed because the PF may change/modify the VF's configuration during VF reset (i.e. modifying the VF's port VLAN configuration). This also, required the VF to call netdev_update_features() since its VLAN features may change during VF reset. Make sure to call this under rtnl_lock(). Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-17 12:37:18 -08:00
Maciej Fijalkowski	dcbaf72aa4	ice: xsk: fix cleaned_count setting Currently cleaned_count is initialized to ICE_DESC_UNUSED(rx_ring) and later on during the Rx processing it is incremented per each frame that driver consumed. This can result in excessive buffers requested from xsk pool based on that value. To address this, just drop cleaned_count and pass ICE_DESC_UNUSED(rx_ring) directly as a function argument to ice_alloc_rx_bufs_zc(). Idea is to ask for buffers as many as consumed. Let us also call ice_alloc_rx_bufs_zc unconditionally at the end of ice_clean_rx_irq_zc. This has been changed in that way for corresponding ice_clean_rx_irq, but not here. Fixes: `2d4238f556` ("ice: Add support for AF_XDP") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-17 11:18:21 -08:00
Maciej Fijalkowski	8bea15ab74	ice: xsk: allow empty Rx descriptors on XSK ZC data path Commit `ac6f733a7b` ("ice: allow empty Rx descriptors") stated that ice HW can produce empty descriptors that are valid and they should be processed. Add this support to xsk ZC path to avoid potential processing problems. Fixes: `2d4238f556` ("ice: Add support for AF_XDP") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-17 11:16:55 -08:00
Maciej Fijalkowski	8b51a13c37	ice: xsk: do not clear status_error0 for ntu + nb_buffs descriptor The descriptor that ntu is pointing at when we exit ice_alloc_rx_bufs_zc() should not have its corresponding DD bit cleared as descriptor is not allocated in there and it is not valid for HW usage. The allocation routine at the entry will fill the descriptor that ntu points to after it was set to ntu + nb_buffs on previous call. Even the spec says: "The tail pointer should be set to one descriptor beyond the last empty descriptor in host descriptor ring." Therefore, step away from clearing the status_error0 on ntu + nb_buffs descriptor. Fixes: `db804cfc21` ("ice: Use the xsk batched rx allocation interface") Reported-by: Elza Mathew <elza.mathew@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-17 11:15:15 -08:00
Alexander Lobakin	0708b6facb	ice: remove dead store on XSK hotpath The 'if (ntu == rx_ring->count)' block in ice_alloc_rx_buffers_zc() was previously residing in the loop, but after introducing the batched interface it is used only to wrap-around the NTU descriptor, thus no more need to assign 'xdp'. Fixes: `db804cfc21` ("ice: Use the xsk batched rx allocation interface") Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-17 11:12:05 -08:00
Maciej Fijalkowski	617f3e1b58	ice: xsk: allocate separate memory for XDP SW ring Currently, the zero-copy data path is reusing the memory region that was initially allocated for an array of struct ice_rx_buf for its own purposes. This is error prone as it is based on the ice_rx_buf struct always being the same size or bigger than what the zero-copy path needs. There can also be old values present in that array giving rise to errors when the zero-copy path uses it. Fix this by freeing the ice_rx_buf region and allocating a new array for the zero-copy path that has the right length and is initialized to zero. Fixes: `57f7f8b6bc` ("ice: Use xdp_buf instead of rx_buf for xsk zero-copy") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-17 11:09:04 -08:00
Maciej Fijalkowski	afe8a3ba85	ice: xsk: return xsk buffers back to pool when cleaning the ring Currently we only NULL the xdp_buff pointer in the internal SW ring but we never give it back to the xsk buffer pool. This means that buffers can be leaked out of the buff pool and never be used again. Add missing xsk_buff_free() call to the routine that is supposed to clean the entries that are left in the ring so that these buffers in the umem can be used by other sockets. Also, only go through the space that is actually left to be cleaned instead of a whole ring. Fixes: `2d4238f556` ("ice: Add support for AF_XDP") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-17 11:07:04 -08:00
Jakub Kicinski	7cd2802d74	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-16 16:13:19 -08:00
Cyril Novikov	bf0a375055	ixgbe: set X550 MDIO speed before talking to PHY The MDIO bus speed must be initialized before talking to the PHY the first time in order to avoid talking to it using a speed that the PHY doesn't support. This fixes HW initialization error -17 (IXGBE_ERR_PHY_ADDR_INVALID) on Denverton CPUs (a.k.a. the Atom C3000 family) on ports with a 10Gb network plugged in. On those devices, HLREG0[MDCSPD] resets to 1, which combined with the 10Gb network results in a 24MHz MDIO speed, which is apparently too fast for the connected PHY. PHY register reads over MDIO bus return garbage, leading to initialization failure. Reproduced with Linux kernel 4.19 and 5.15-rc7. Can be reproduced using the following setup: * Use an Atom C3000 family system with at least one X552 LAN on the SoC * Disable PXE or other BIOS network initialization if possible (the interface must not be initialized before Linux boots) * Connect a live 10Gb Ethernet cable to an X550 port * Power cycle (not reset, doesn't always work) the system and boot Linux * Observe: ixgbe interfaces w/ 10GbE cables plugged in fail with error -17 Fixes: `e84db72727` ("ixgbe: Introduce function to control MDIO speed") Signed-off-by: Cyril Novikov <cnovikov@lynx.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-15 11:24:39 -08:00
Robert Schlabbach	271225fd57	ixgbe: Document how to enable NBASE-T support Commit `a296d665ea` ("ixgbe: Add ethtool support to enable 2.5 and 5.0 Gbps support") introduced suppression of the advertisement of NBASE-T speeds by default, according to Todd Fujinaka to accommodate customers with network switches which could not cope with advertised NBASE-T speeds, as posted in the E1000-devel mailing list: https://sourceforge.net/p/e1000/mailman/message/37106269/ However, the suppression was not documented at all, nor was how to enable NBASE-T support. Properly document the NBASE-T suppression and how to enable NBASE-T support. Fixes: `a296d665ea` ("ixgbe: Add ethtool support to enable 2.5 and 5.0 Gbps support") Reported-by: Robert Schlabbach <robert_s@gmx.net> Signed-off-by: Robert Schlabbach <robert_s@gmx.net> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-15 11:09:29 -08:00
Sasha Neftin	0182d1f3fa	igc: Fix typo in i225 LTR functions The LTR maximum value was incorrectly written using the scale from the LTR minimum value. This would cause incorrect values to be sent, in cases where the initial calculation lead to different min/max scales. Fixes: `707abf0695` ("igc: Add initial LTR support") Suggested-by: Dima Ruinskiy <dima.ruinskiy@intel.com> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-15 11:09:29 -08:00
Letu Ren	b6d335a60d	igbvf: fix double free in `igbvf_probe` In `igbvf_probe`, if register_netdev() fails, the program will go to label err_hw_init, and then to label err_ioremap. In free_netdev() which is just below label err_ioremap, there is `list_for_each_entry_safe` and `netif_napi_del` which aims to delete all entries in `dev->napi_list`. The program has added an entry `adapter->rx_ring->napi` which is added by `netif_napi_add` in igbvf_alloc_queues(). However, adapter->rx_ring has been freed below label err_hw_init. So this a UAF. In terms of how to patch the problem, we can refer to igbvf_remove() and delete the entry before `adapter->rx_ring`. The KASAN logs are as follows: [ 35.126075] BUG: KASAN: use-after-free in free_netdev+0x1fd/0x450 [ 35.127170] Read of size 8 at addr ffff88810126d990 by task modprobe/366 [ 35.128360] [ 35.128643] CPU: 1 PID: 366 Comm: modprobe Not tainted 5.15.0-rc2+ #14 [ 35.129789] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 [ 35.131749] Call Trace: [ 35.132199] dump_stack_lvl+0x59/0x7b [ 35.132865] print_address_description+0x7c/0x3b0 [ 35.133707] ? free_netdev+0x1fd/0x450 [ 35.134378] __kasan_report+0x160/0x1c0 [ 35.135063] ? free_netdev+0x1fd/0x450 [ 35.135738] kasan_report+0x4b/0x70 [ 35.136367] free_netdev+0x1fd/0x450 [ 35.137006] igbvf_probe+0x121d/0x1a10 [igbvf] [ 35.137808] ? igbvf_vlan_rx_add_vid+0x100/0x100 [igbvf] [ 35.138751] local_pci_probe+0x13c/0x1f0 [ 35.139461] pci_device_probe+0x37e/0x6c0 [ 35.165526] [ 35.165806] Allocated by task 366: [ 35.166414] ____kasan_kmalloc+0xc4/0xf0 [ 35.167117] foo_kmem_cache_alloc_trace+0x3c/0x50 [igbvf] [ 35.168078] igbvf_probe+0x9c5/0x1a10 [igbvf] [ 35.168866] local_pci_probe+0x13c/0x1f0 [ 35.169565] pci_device_probe+0x37e/0x6c0 [ 35.179713] [ 35.179993] Freed by task 366: [ 35.180539] kasan_set_track+0x4c/0x80 [ 35.181211] kasan_set_free_info+0x1f/0x40 [ 35.181942] ____kasan_slab_free+0x103/0x140 [ 35.182703] kfree+0xe3/0x250 [ 35.183239] igbvf_probe+0x1173/0x1a10 [igbvf] [ 35.184040] local_pci_probe+0x13c/0x1f0 Fixes: `d4e0fe01a3` (igbvf: add new driver to support 82576 virtual functions) Reported-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: Letu Ren <fantasquex@gmail.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-15 11:09:21 -08:00
Karen Sornek	584af82154	igb: Fix removal of unicast MAC filters of VFs Move checking condition of VF MAC filter before clearing or adding MAC filter to VF to prevent potential blackout caused by removal of necessary and working VF's MAC filter. Fixes: `1b8b062a99` ("igb: add VF trust infrastructure") Signed-off-by: Karen Sornek <karen.sornek@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-15 11:08:18 -08:00
Jesse Brandeburg	9c99d099f7	ice: use modern kernel API for kick The kernel gained a new interface for drivers to use to combine tail bump (doorbell) and BQL updates, attempt to use those new interfaces. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-15 08:49:25 -08:00
Jesse Brandeburg	21c6e36b1e	ice: tighter control over VSI_DOWN state The driver had comments to the effect of: This flag should be set before calling this function. While reviewing code it was found that there were several violations of this policy, which could introduce hard to find bugs or races. Fix the violations of the "VSI DOWN state must be set before calling ice_down" and make checking the state into code with a WARN_ON. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-15 08:48:26 -08:00
Jesse Brandeburg	cc14db11c8	ice: use prefetch methods The kernel provides some prefetch mechanisms to speed up commonly cold cache line accesses during receive processing. Since these are software structures it helps to have these strategically placed prefetches. Be careful to call BQL prefetch complete only for non XDP queues. Co-developed-by: Piotr Raczynski <piotr.raczynski@intel.com> Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-15 08:46:28 -08:00
Jesse Brandeburg	1c96c16858	ice: update to newer kernel API Use the netif_tx_* API from netdevice.h which has simpler parameters. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-15 08:45:28 -08:00
Jacob Keller	399e27dbbd	ice: support immediate firmware activation via devlink reload The ice hardware contains an embedded chip with firmware which can be updated using devlink flash. The firmware which runs on this chip is referred to as the Embedded Management Processor firmware (EMP firmware). Activating the new firmware image currently requires that the system be rebooted. This is not ideal as rebooting the system can cause unwanted downtime. In practical terms, activating the firmware does not always require a full system reboot. In many cases it is possible to activate the EMP firmware immediately. There are a couple of different scenarios to cover. * The EMP firmware itself can be reloaded by issuing a special update to the device called an Embedded Management Processor reset (EMP reset). This reset causes the device to reset and reload the EMP firmware. * PCI configuration changes are only reloaded after a cold PCIe reset. Unfortunately there is no generic way to trigger this for a PCIe device without a system reboot. When performing a flash update, firmware is capable of responding with some information about the specific update requirements. The driver updates the flash by programming a secondary inactive bank with the contents of the new image, and then issuing a command to request to switch the active bank starting from the next load. The response to the final command for updating the inactive NVM flash bank includes an indication of the minimum reset required to fully update the device. This can be one of the following: * A full power on is required * A cold PCIe reset is required * An EMP reset is required The response to the command to switch flash banks includes an indication of whether or not the firmware will allow an EMP reset request. For most updates, an EMP reset is sufficient to load the new EMP firmware without issues. In some cases, this reset is not sufficient because the PCI configuration space has changed. When this could cause incompatibility with the new EMP image, the firmware is capable of rejecting the EMP reset request. Add logic to ice_fw_update.c to handle the response data flash update AdminQ commands. For the reset level, issue a devlink status notification informing the user of how to complete the update with a simple suggestion like "Activate new firmware by rebooting the system". Cache the status of whether or not firmware will restrict the EMP reset for use in implementing devlink reload. Implement support for devlink reload with the "fw_activate" flag. This allows user space to request the firmware be activated immediately. For the .reload_down handler, we will issue a request for the EMP reset using the appropriate firmware AdminQ command. If we know that the firmware will not allow an EMP reset, simply exit with a suitable netlink extended ACK message indicating that the EMP reset is not available. For the .reload_up handler, simply wait until the driver has finished resetting. Logic to handle processing of an EMP reset already exists in the driver as part of its reset and rebuild flows. Implement support for the devlink reload interface with the "fw_activate" action. This allows userspace to request activation of firmware without a reboot. Note that support for indicating the required reset and EMP reset restriction is not supported on old versions of firmware. The driver can determine if the two features are supported by checking the device capabilities report. I confirmed support has existed since at least version 5.5.2 as reported by the 'fw.mgmt' version. Support to issue the EMP reset request has existed in all version of the EMP firmware for the ice hardware. Check the device capabilities report to determine whether or not the indications are reported by the running firmware. If the reset requirement indication is not supported, always assume a full power on is necessary. If the reset restriction capability is not supported, always assume the EMP reset is available. Users can verify if the EMP reset has activated the firmware by using the devlink info report to check that the 'running' firmware version has updated. For example a user might do the following: # Check current version $ devlink dev info # Update the device $ devlink dev flash pci/0000:af:00.0 file firmware.bin # Confirm stored version updated $ devlink dev info # Reload to activate new firmware $ devlink dev reload pci/0000:af:00.0 action fw_activate # Confirm running version updated $ devlink dev info Finally, this change does not implement basic driver-only reload support. I did look into trying to do this. However, it requires significant refactor of how the ice driver probes and loads everything. The ice driver probe and allocation flows were not designed with such a reload in mind. Refactoring the flow to support this is beyond the scope of this change. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-15 08:40:38 -08:00
Jacob Keller	af18d8866c	ice: reduce time to read Option ROM CIVD data During probe and device reset, the ice driver reads some data from the NVM image as part of ice_init_nvm. Part of this data includes a section of the Option ROM which contains version information. The function ice_get_orom_civd_data is used to locate the '$CIV' data section of the Option ROM. Timing of ice_probe and ice_rebuild indicate that the ice_get_orom_civd_data function takes about 10 seconds to finish executing. The function locates the section by scanning the Option ROM every 512 bytes. This requires a significant number of NVM read accesses, since the Option ROM bank is 500KB. In the worst case it would take about 1000 reads. Worse, all PFs serialize this operation during reload because of acquiring the NVM semaphore. The CIVD section is located at the end of the Option ROM image data. Unfortunately, the driver has no easy method to determine the offset manually. Practical experiments have shown that the data could be at a variety of locations, so simply reversing the scanning order is not sufficient to reduce the overall read time. Instead, copy the entire contents of the Option ROM into memory. This allows reading the data using 4Kb pages instead of 512 bytes at a time. This reduces the total number of firmware commands by a factor of 8. In addition, reading the whole section together at once allows better indication to firmware of when we're "done". Re-write ice_get_orom_civd_data to allocate virtual memory to store the Option ROM data. Copy the entire OptionROM contents at once using ice_read_flash_module. Finally, use this memory copy to scan for the '$CIV' section. This change significantly reduces the time to read the Option ROM CIVD section from ~10 seconds down to ~1 second. This has a significant impact on the total time to complete a driver rebuild or probe. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-15 08:38:14 -08:00
Jacob Keller	c9f7a483e4	ice: move ice_devlink_flash_update and merge with ice_flash_pldm_image The ice_devlink_flash_update function performs a few upfront checks and then calls ice_flash_pldm_image. Most if these checks make more sense in the context of code within ice_flash_pldm_image. Merge ice_devlink_flash_update and ice_flash_pldm_image into one function, placing it in ice_fw_update.c Since this is still the entry point for devlink, call the function ice_devlink_flash_update instead of ice_flash_pldm_image. This leaves a single function which handles the devlink parameters and then initiates a PLDM update. With this change, the ice_devlink_flash_update function in ice_fw_update.c becomes the main entry point for flash update. It elimintes some unnecessary boiler plate code between the two previous functions. The ultimate motivation for this is that it eases supporting a dry run with the PLDM library in a future change. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-15 08:37:14 -08:00
Jacob Keller	c356eaa824	ice: move and rename ice_check_for_pending_update The ice_devlink_flash_update function performs a few checks and then calls ice_flash_pldm_image. One of these checks is to call ice_check_for_pending_update. This function checks if the device has a pending update, and cancels it if so. This is necessary to allow a new flash update to proceed. We want to refactor the ice code to eliminate ice_devlink_flash_update, moving its checks into ice_flash_pldm_image. To do this, ice_check_for_pending_update will become static, and only called by ice_flash_pldm_image. To make this change easier to review, first just move the function up within the ice_fw_update.c file. While at it, note that the function has a misleading name. Its primary action is to cancel a pending update. Using the verb "check" does not imply this. Rename it to ice_cancel_pending_update. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-15 08:36:10 -08:00
Jacob Keller	78ad87da99	ice: devlink: add shadow-ram region to snapshot Shadow RAM We have a region for reading the contents of the NVM flash as a snapshot. This region does not allow reading the Shadow RAM, as it always passes the FLASH_ONLY bit to the low level firmware interface. Add a separate shadow-ram region which will allow snapshot of the current contents of the Shadow RAM. This data is built from the NVM contents but is distinct as the device builds up the Shadow RAM during initialization, so being able to snapshot its contents can be useful when attempting to debug flash related issues. Fix the comment description of the nvm-flash region which incorrectly stated that it filled the shadow-ram region, and add a comment explaining that the nvm-flash region does not actually read the Shadow RAM. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-15 08:34:54 -08:00
Karol Kolacinski	37e738b6fd	ice: Don't put stale timestamps in the skb The driver has to check if it does not accidentally put the timestamp in the SKB before previous timestamp gets overwritten. Timestamp values in the PHY are read only and do not get cleared except at hardware reset or when a new timestamp value is captured. The cached_tstamp field is used to detect the case where a new timestamp has not yet been captured, ensuring that we avoid sending stale timestamp data to the stack. Fixes: `ea9b847cda` ("ice: enable transmit timestamps for E810 devices") Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-14 11:31:47 -08:00
Karol Kolacinski	0013881c11	ice: Use div64_u64 instead of div_u64 in adjfine Change the division in ice_ptp_adjfine from div_u64 to div64_u64. div_u64 is used when the divisor is 32 bit but in this case incval is 64 bit and it caused incorrect calculations and incval adjustments. Fixes: `06c16d89d2` ("ice: register 1588 PTP clock device object for E810 devices") Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-14 11:29:34 -08:00
Tony Nguyen	f8a3bcceb4	ice: Remove unused ICE_FLOW_SEG_HDRS_L2_MASK Remove the unused define ICE_FLOW_SEG_HDRS_L2_MASK. Reported-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Acked-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Gurucharan G <gurucharanx.g@intel.com>	2021-12-14 10:19:14 -08:00

1 2 3 4 5 ...

7472 commits